new “old” step towards interpretable AI

There has been an interesting breakthrough in the world of artificial intelligence. Researchers have developed a new type of neural networks that can make their work more transparent and understandable. These networks, called Kolmogorov-Arnold networks (KANs), are based on mathematical principleopened more than half a century ago.

Neural networks today are the most powerful tools of artificial intelligence. They are capable of solving the most complex problems, processing huge amounts of data. However, they have a significant drawback – their work is opaque. Scientists cannot fully understand how exactly the networks come to their conclusions. This phenomenon is called the “black box” in the world of AI.

For a long time, researchers have wondered: is it possible to create neural networks that would produce equally accurate results, but would work in a more understandable way? And now, it seems, the answer has been found.

In April 2024, a new neural network architecture was introduced – Kolmogorov-Arnold networks. These networks are capable of performing almost all the same tasks as regular neural networks, but their work is much more transparent. KAN is based on a mathematical idea from the mid-20th century, which has been adapted for the modern era of deep learning.

Although KANs are relatively new, they have already attracted a lot of interest in the scientific community. Researchers note that these networks are more understandable and can be especially useful in scientific applications. They can be used to extract scientific patterns directly from data. This opens up exciting new possibilities for scientific research.

Conformity to the impossible

To understand the advantage of KAN, you need to understand how conventional neural networks work. They consist of layers of artificial neurons connected to each other. Information passes through these layers, is processed, and eventually turns into a result. The connections between neurons have different weights that determine the strength of influence. As the network learns, these weights are constantly adjusted so that the result becomes more and more accurate.

The main task of a neural network is to find a mathematical function that best describes the available data. The more accurate this function is, the better the network's predictions. Ideally, if the network models some physical process, the function found should represent a physical law describing this process.

For regular neural networks, there is a mathematical theorem that tells how close the network can get to the ideal function. This theorem implies that the network cannot represent this function exactly. But KANs are capable of this under certain conditions.

KANs work fundamentally differently. Instead of numerical weights, they use functions on the connections between neurons. These functions are nonlinear, meaning they can describe more complex dependencies. At the same time, they can also be trained, adjusting with much greater accuracy than simple numerical weights.

However, for a long time KAN was considered a purely theoretical design, unsuitable for practical use. As early as 1989, scientific article It was explicitly stated that the mathematical idea underlying KANs is “inappropriate in the context of learnable networks.”

The origins of this idea go back to 1957, when mathematicians Andrey Kolmogorov and Vladimir Arnold proved an interesting theorem. They showed that any complex function of many variables can be represented as a combination of many simple functions of one variable.

Andrey Kolmogorov (top) and Vladimir Arnold proved in 1957 that a complex mathematical function can be rewritten as a combination of simpler ones.

Andrey Kolmogorov (top) and Vladimir Arnold proved in 1957 that a complex mathematical function can be rewritten as a combination of simpler ones.

There was one problem, however. The simple functions obtained as a result of applying the theorem could be “non-smooth”, i.e. have sharp corners. This created difficulties for constructing a trainable neural network based on them. After all, for successful training, the functions must be smooth so that they can be smoothly adjusted.

So the idea of ​​KAN remained a theoretical possibility for a long time. But that all changed last January, when MIT physics graduate student Qiming Liu took up the topic. He had been working on ways to make neural networks more understandable for scientific applications, but all his attempts had ended in failure. So Liu decided to return to the Kolmogorov-Arnold theorem, even though it hadn’t received much attention before.

His supervisor, physicist Max Tegmark, was initially skeptical. He was familiar with the 1989 paper and thought the effort would again hit a dead end. But Liu persisted, and Tegmark soon changed his mind. They realized that even if the functions the theorem generated were not smooth, the network could still approximate them with smooth functions. In fact, most functions encountered in science are smooth, meaning that a perfect (rather than approximate) representation of them should be theoretically possible.

Liu was reluctant to give up on the idea without trying it out. He realized that computing power had advanced dramatically in the 35 years since the 1989 paper. What seemed impossible then might well be possible now.

Liu worked on the idea for about a week, developing several prototypes of KANs. All of them had two layers, the simplest structure that researchers have focused on for decades. The choice of a two-layer architecture seemed natural, since the Kolmogorov-Arnold theorem itself essentially provides a blueprint for such a structure. The theorem breaks down a complex function into separate sets of internal and external functions, which corresponds well to the two-layer structure of a neural network.

But to Liu’s disappointment, none of his prototypes performed well in the scientific problems he was hoping to solve. Then Tegmark came up with a key idea: What if we tried a KAN with more layers? Such a network could handle more complex problems.

Qiming Liu used the Kolmogorov-Arnold theorem to build a new type of neural networks.

Qiming Liu used the Kolmogorov-Arnold theorem to build a new type of neural networks.

This unconventional idea turned out to be a breakthrough. Multilayer KANs began to show promise, and Liu and Tegmark soon recruited colleagues from MIT, Caltech, and Northeastern University. They wanted to assemble a team that included both mathematicians and experts in the fields where they planned to apply KANs.

In an April paper, the group demonstrated that three-layer KANs are indeed possible. They gave an example of a three-layer KAN that could accurately represent a function that a two-layer network could not. But the researchers didn’t stop there. Since then, they have experimented with networks with up to six layers. With each additional layer, the network became able to perform increasingly complex functions. “We found that we could add as many layers as we wanted,” said one of the paper’s co-authors.

Proven improvements

The authors also applied their networks to two real-world problems. The first was in a field of mathematics known as knot theory. In 2021, the DeepMind team created a conventional neural network that could predict a specific topological property of a knot based on its other properties. Three years later, the new KAN not only repeated this achievement, but went further. It was able to show how the predicted property related to all the others, something conventional neural networks cannot do.

The second task involved a phenomenon in condensed matter physics called Anderson localization. The goal was to predict the boundary at which a certain phase transition occurs, and then derive a mathematical formula describing this process. No conventional neural network had ever been able to do this. KAN did it.

But the main advantage of KANs over other types of neural networks is their interpretability. That was the main motivation for their development, according to Tegmark. In both examples, KANs didn't just produce an answer, they also provided an explanation. “What does it mean for something to be interpretable? If you give me some data, I'll give you a formula that you can write on a T-shirt,” Tegmark explained.

Max Tegmark, a colleague of Liu, made the key proposal that led to the functioning of Kolmogorov-Arnold networks.

Max Tegmark, a colleague of Liu, made the key proposal that led to the functioning of Kolmogorov-Arnold networks.

This ability of KANs, while still limited, suggests that they could theoretically teach us something new about the world around us, said Bryce Menard, a physicist at Johns Hopkins University who studies machine learning. “If the problem is truly described by a simple equation, KANs are pretty good at finding it,” he said. But Menard cautioned that the domain where KANs work best is likely to be limited to problems like those found in physics, where equations typically contain very few variables.

Liu and Tegmark agree, but they don't see it as a drawback. “Almost all known scientific formulas, like E = mc², can be written in terms of functions of one or two variables,” Tegmark said. “The vast majority of calculations we do depend on one or two variables. KANs exploit this fact and look for solutions in this form.”

Final equations

Liu and Tegmark's paper on KAN quickly created a stir in the scientific community, amassing 75 citations in just three months. Soon, other research groups began working on their own versions of KAN.

In June, Yizheng Wang of Tsinghua University and colleagues showed that their Kolmogorov-Arnold Neural Network (KINN) “significantly outperforms” conventional neural networks at solving partial differential equations. That's a significant achievement, since such equations are ubiquitous in science.

Studypublished in July by researchers at the National University of Singapore, yielded more mixed results. They found that KANs outperformed conventional networks on tasks where interpretability was important. However, on tasks such as computer vision and audio processing, conventional networks performed better. On natural language processing and other machine learning tasks, both types of networks performed about the same. For Liu, these findings were not surprising. After all, the KAN team initially focused on “science-related tasks,” where interpretability was a top priority.

Meanwhile, Liu is working to make KANs more practical and easier to use. In August, he and his colleagues published a new paper under called “KAN 2.0”. Liu described it as “more like a user manual than a typical scientific paper.” He said the version was more user-friendly and offered new features, such as a multiplication tool, that were missing from the original model.

Liu and his co-authors argue that this type of network is more than just a problem-solving tool. KANs advance what the group calls “curiosity-driven science,” which complements the “application-driven science” that has long dominated machine learning.

For example, when studying the motion of celestial bodies, application-oriented researchers focus on predicting their future positions, while curiosity-driven scientists hope to uncover the fundamental physics behind that motion. Liu believes that with KANs, researchers could get much more out of neural networks than just helping them solve complex computational problems. Instead, they could focus on gaining a deep understanding of the phenomena being studied for their own sake.

This approach opens up exciting prospects for science. KANs can become a powerful tool not only for predicting outcomes, but also for uncovering hidden patterns and principles underlying various natural and technical processes.

Of course, KANs are still in their early stages, and there are still many challenges to overcome before they can reach their full potential. But it is already clear that this new neural network architecture can significantly change the way artificial intelligence is used in scientific research.

The ability to “look inside” the work of a neural network, to understand the logic behind its conclusions, is something that scientists have dreamed of since the advent of deep learning technology. KANs take an important step in this direction, offering not just accurate predictions, but also understandable explanations.

This could lead to a real breakthrough in various fields of science. Imagine a neural network that not only predicts the weather with high accuracy, but also derives new meteorological laws. Or not only recognizes cancer cells in images, but also formulates new hypotheses about the mechanisms of tumor development.

Of course, KAN is not a universal solution to all problems. The technology has its limitations and areas where it may be less effective than traditional neural networks. But in the field of scientific research, especially where it is important not only to obtain a result but also to understand how it was achieved, KAN can become an indispensable tool.

The work by Liu, Tegmark, and their colleagues opens a new chapter in the history of artificial intelligence. It shows that sometimes, to move forward, you need to look back and rethink old ideas. A theorem proven by Kolmogorov and Arnold more than half a century ago has found an unexpected application in the era of deep learning, offering a solution to one of the most difficult problems in modern AI.

The future of KAN looks promising. As researchers continue to experiment with this architecture, new possibilities and applications are opening up. We may be on the cusp of a new era in artificial intelligence—an era where machines not only provide answers, but also help us understand why those answers are correct.

Ultimately, the goal of science is not just to predict phenomena, but to understand them. KANs offer a path to this understanding by combining the power of modern computing with the transparency and interpretability of classical mathematics. This fusion can lead to new discoveries and insights that were previously unavailable.

So next time you hear about a breakthrough in artificial intelligence, remember KANs. These networks can not only solve complex problems, but also explain their solutions in human-readable language. And who knows, maybe KANs will help us solve nature’s next great mystery by writing its formula on a T-shirt, as Max Tegmark dreams.

All this and much more — TG “Mathematics is not for everyone”

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *