The brilliance and poverty of artificial intelligence

We will do without long introductions and deep philosophizing about modern trends in the development of generative models in general and LLM (large language models) models in particular. Many people know this first-hand, and those who don’t know what is said below are simply not interested.

Everyone is anxiously awaiting a breakthrough. No, not even a BREAKTHROUGH! LLMs have been able to surprise us over the past year, even entering our lives and taking their place of honor. On the horizon is AGI (artificial general intelligence), which will come and wave its magic wand and change all our lives. Will not come. Until he comes and changes. Yes, soon we will have the GPT-5 model, which, as the developers promise, will be head and shoulders above version 4. Multimodal models are coming out. But this is far from AGI. So far, only a dead end has clearly loomed on the horizon. Alarming messages are already starting to surface that new models do not have enough training data. Models grow, inflate like soap bubbles, absorbing enormous computing power, gigawatts of energy and terabytes of information, but still make numerous factual and logical errors that even a person with below average intelligence would not make, although the models themselves, according to tests, claim to be average level. It would seem, what is the problem? Let's triple the number of parameters, add a hundred terabytes of training information and that's it. But this does not help, the size of the models increases by multiples, and the quality of generation improves only by percentages. We have to sculpt crutches that create the illusion of a thinking machine. But the trouble is, the machine didn’t think, and still doesn’t think. It is linear like the digestive tract – from input to output, the output result is not analyzed, not thought about, but generated. She doesn't know how to teach herself, she needs to be taught. For every action you need to show her the result so that she remembers. Modern LLM models do not analyze information already known to them in order to draw new conclusions, and do not operate with facts. They take information and average it. The only way for them to grow is quantitative. But such a model requires a lot of information, and high-quality information, but it is not there, it is running out and there is nowhere to get it. The developers' ambitions have hit a glass ceiling. Humanity generates a lot of information, but this is mostly information irrelevant for learning, digital noise, averaging which we get an LLM model with the abilities of the average person with iq=100. Undoubtedly very erudite, knowing everything or almost everything (if he doesn’t lie), but intellectually a layman. Yes, you can spend millions of man-hours processing and filtering training information, this will improve the quality of the models, but there will be no breakthrough. We will not get superintelligence that reveals the secrets of the universe. Rather, we will get an average, emasculated polymath with elements of tediousness.

The problem is primarily in the linear, sequential structure of modern neural networks, which serve as the technological foundation of AI. Artificial neural networks have an input and an output, between which there are internal layers connected in series. On the one hand, this fundamentally deprives them of the opportunity for introspection and reflection without using additional architectural structures. An artificial neural network cannot “think about” something or draw independent conclusions; it immediately generates a result.

This linear structure, in turn, is determined by the training approach used: backpropagation of error from output to input. Thus, the problem is fundamental and has no other solution than to rewrite the entire architecture of artificial neural networks. At the current stage, this problem is solved with crutches, which are used to control the result of generation, which in the end will only aggravate the problem, layering more and more new problems, since instead of simplifying the structure, we get complication with the inevitable introduction of errors with unpredictable results.

Another significant drawback of the existing architecture is the inability to self-learn during operation. The model is static within the framework of the weights obtained during the training stage. The processes of learning and generation are mutually exclusive.

To get out of the technological impasse, it is necessary to develop a new type of artificial intelligence, closer to the architecture of the biological brain, and not at the level of an individual neuron, but at the structural level.

Undoubtedly, LLM models have shown impressive capabilities, but fundamentally they are not suitable for creating general intelligence. Rather, their capabilities will be used to solve private, peripheral problems, but they will not become the core of the AGI system.

I will try to formulate the requirements for the core of such a system:

1. Creating a model of the world based on the information received. To a certain extent, it can be argued that the current architecture of Transformers creates such a model. But, as mentioned above, this model is linear, and therefore useless in the tasks of creating AGI. The resulting model of the world must be complete, consistent, ideally interpretable and, most importantly, not linear, but self-closed. Moreover, the model must fundamentally be able to learn without a teacher, receiving a stream of information, analyzing it, classifying it based on a basic set of rules.

2. Possibility of simulation. This property is achieved through self-closure. AGI, so to speak, “lives” in the created model of the world and uses it as criteria for evaluating all incoming information and evaluating all potential actions. Those. conducts simulations, evaluates consequences and makes decisions. Although, in essence, if you dive into the question deeper, this artificial model of the world, existing in dynamics, is AGI itself.

3. At the architectural level, solve the problems of short-term and long-term storage of factual information within the model itself, without using artificial external structures in the form of context windows and the like.

To summarize, at the moment there are no prerequisites for creating truly strong artificial intelligence, or almost nothing is known about them. Existing architectural solutions are very tentatively suitable for this, requiring disproportionately colossal energy and information costs, which we are not able to provide. We need a fundamentally new approach, a new architecture of artificial intelligence systems that will solve similar problems, but more rationally. An architecture that will be interpretable, and therefore controllable by humans.

Undoubtedly, the vector of development of existing systems is clear and progress will continue along this path for some time. Models will become more complex, requiring more and more resources, until they finally reach a dead end. But that is another story.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *