Real examples of how AI optimizes chip design

Integrated circuit congestion, where too many circuits pass through individual sections (in red)
Today, plans are being considered in the Russian Federation to create a full-fledged infrastructure for the production of microcircuits, organize 300 design centers with a staff of at least 100 specialized specialists, and launch new factories. In this regard, it is interesting to see what ML methods are used in the design of modern microelectronics. Something can be adopted for domestic developments.
Machine learning (ML) models are inherently best suited for IC design because today’s microcircuits with billions of transistors are the most complex systems, the parameters of which become difficult to calculate in conventional ways.
Moreover, the production of IP itself is statistical characterwhich should also be taken into account by designers.
Obviously, AI can be implemented at different levels of design. For example, for calibration, see the article “Using Machine Learning to Calibrate Analog ICs” (“Bulletin of the Voronezh State Technical University”).
Another obvious way is to take existing CAD tools – and connect ML modules to them to optimize individual calculations. Here, neural networks significantly speed up calculations, in some cases by several orders of magnitude.
Recently, more specialized AI/ML tools for CAD have appeared on the market, such as Synopsys or Cadence Cerebrus Intelligent Chip Explorer from Cadence Design Systems and others. AT promotional slides developers at Cadence say that with these intelligent design tools, one engineer is able to quickly complete work that previously required many months of manual labor from a team of designers.
Of course it looks like marketing. But what do real engineers say who directly do the design of microcircuits with the help of such tools with their own hands?
Nvidia R&D lead researcher Bill Dalley his speech at the GTC 2022 conference gave some interesting examples of how Nvidia uses ML in the development and optimization of chips. This is the most up-to-date information directly about the current developments of the Nvidia R&D design center, which now employs about 300 engineers. All technical processes are described in four scientific articles indicated under the illustrations (one, 2, 3, 4plus article on NVCell).
For example, one of the AI tools in the CAD system takes an electrical circuit diagram in a GPU chip – and evaluates how the voltage will drop in different situations (IR Drop Estimation). Such a calculation in a typical commercial CAD system takes three hours, and the model completes it in 18 minutes (feature extraction) plus 3 seconds (direct calculation), said Dally.
As shown in the illustration, after appropriate training, the ML model effectively learns the coefficients and shows an accuracy of 94% (in a three-second output).
In this case, not a convolutional, but a graph neural network (GNN) is used. It evaluates the switching frequency of various nodes in the circuit, which allows you to derive a final voltage estimate.
Bill Dalley says that this is his favorite use of ML because he himself has spent a lot of time calculating parasitic characteristics using graph neural networks. In the old days, circuit design was an iterative process. The topology designer drew a circuit and each transistor on it, as in the lower illustration on the left.
But you didn’t know how it would actually work until a layout designer took the schematic, made a layout, and calculated the spurious characteristics. Only then could you run a circuit simulation and find that it didn’t meet some of the specifications. I had to go back, change the scheme – and again send it for physical verification of the layout. A very long, iterative and extremely time-consuming process.
Now you can train neural networks to predict parasitic characteristics without having to make a mockup. As shown in the graph above, the model produces a fairly accurate calculation.
In addition, ML models are used to calculate route congestion – and search for potential congestion in routing on the IS. This is very important when laying out microcircuits.
In the usual case, you need to take a list of networks and run through the placement and routing process, which takes a lot of time, often even several days. And only then do we get the actual picture of congestion and congestion. And we understand that the original schemes are not suitable. We need to refactor them, laying circuits around the red areas (in the illustration) through which too many nets pass.
The graph network model takes a list of networks and predicts congestion areas. It doesn’t work perfectly, but it roughly shows where problems might be, which in many cases is enough to speed up iterations. Thus, chip design iterations are reduced from several days to several minutes.
All of the approaches listed above work on a similar principle: they correct errors in the design that a person has completed. But it is much more interesting to look at examples of work where AI makes its own projects.
Bill Dally gave two such examples. The first of these is a proprietary system called NVCell (simulated annealing + reinforcement learning) to develop a “standard library” of cells. Every time Nvidia moves to a new process, like from 7nm to 5nm, it uses this library. What is a cell? This is the basic element of a digital circuit, like the AND and OR logic gates or the full adder. The library contains many thousands of such cells, which must be redesigned for a new manufacturing technology with a very complex set of design rules.
“Basically, we use reinforcement learning to place transistors,” says Dally. – But after placing them, there are usually a bunch of errors in the design rules, it happens almost like in an Atari game [скорее всего, Дэлли имеет в виду известные примеры, как нейросеть DeepMind самостоятельно обучалась побеждать в старых играх Atari методом проб и ошибок — прим. пер.]. It’s like Atari, only our video game is fixing the design rules in a standard cell. By going through and correcting these design rule errors with reinforcement learning, we can basically complete the design of our standard cells.”
The last slide shows that 92% of the cell library was compiled with this tool without errors. In 12% of cases, cells are smaller than human-designed ones. In general, in terms of complexity, NVCell works no worse than a person.
According to a lead researcher at Nvidia R&D, manually compiling such a library took nearly a year of work by a team of ten people, and now only a few days of work on two GPUs. And in many cases, the design is better than that of a person.
Of course, people will not be left without work. They can pay attention not to routine work, but to more complex and interesting tasks, for example, those 8% of cells that NVCell was not able to design automatically.
The second example of reinforcement learning is the PrefixRL system for designing parallel prefix circuits, where the role of an agent (design) is actually created, acting in a hostile environment (physical circuit) in order to win, and minimization of the chip area and signal transmission delay are laid as incentives for winning.
Video recording of Dally’s report (42 min 40 sec) published on the GCC 2022 conference website and is available after free registration in the Nvidia Developer Program.
See PDF document with presentation slides. here.
Presenter Bill Dally in his home lab
P.S. Theoretically, if AI could independently design microcircuits and optimize them, this would mean a direct path to the singularity. However, we are still far from that. In fact, the development of each ML model requires the work of qualified specialists, which is indispensable.