How knowledge graphs and LLMs can help each other

Pre-trained language models generate high-quality text comparable in quality to human text (sometimes even superior to it). But some problems remain even with the best LLMs – the network does not understand what it is saying. The result may be masterful in terms of grammar and vocabulary, but still incorrect in meaning.

Known recent example, which the authors dubbed the “reversal curse.” Even GPT-4 may not be able to construct inverse logical connections between simple facts. For example, when asked what the name of Tom Cruise's mother is, GPT-4 answers correctly (Mary Lee Pfeiffer). But unfortunately, he doesn’t know the name of Mary Lee Pfeiffer GPT-4’s son.

One can argue that this is not a curse, but a natural and necessary internal property of the neural network. Because nodes with a small number of connections (and Mary Lee Pfeiffer is obviously rare in the original dataset) must be ignored, otherwise the output will be drowned in a mass of unnecessary information. That is, if B follows from A, but there is a serious imbalance for A and B, then the neural network should not build feedback from B to A. In other words, it’s not a bug, but a feature.

Such arguments have their place, but I still want such a powerful model to be able to perform a simple inversion that is accessible even to a child. There is another example — GPT-4 is noticeably worse than humans at solving simple logic problems

Already several years ago, the idea of ​​using knowledge graphs (KG) and LLM together appeared. A knowledge graph is a set of triplets “object 1, object 2 and the type of connection between them.” The idea of ​​making friends between KG and LLM quite naturally arises due to their peculiar relationship – graphs consist of words and their connections, and LLM predicts a word by building connections (albeit of a completely different kind – statistical) with previous tokens. Therefore, several years ago several such areas became active at once. With the help of LLM, knowledge graphs are built, and with the help of knowledge graphs, LLM is improved. In the second case, they try to use KG in different ways – at the stage before training (expand the input data, optimize masking), during (integrate into encoders or create separate layers) and after training (fine-tuning).

One of the significant questions is which graph to use. I would like to take, relatively speaking, the complete knowledge graph of the entire Wikipedia, so that any response from the model would correspond to it. But in this case, as you would expect, unnecessary information turns into noise and reduces quality. Manually selecting the desired part of the graph seems pointless in terms of speed and scaling.

Aims to solve these problems CogNLGL. The authors were inspired by a theory from cognitive psychology about two subsystems that make up human thinking. The first quickly and unconsciously collects information, the second slowly and purposefully analyzes it. CogNLG also consists of two systems. The first system actually generates the next token (in this case, the authors took GPT-2 in its place), but for this it uses knowledge already filtered by the second system. Here the authors compare it with the process of writing a text by a person – the person first collects the necessary values, and only then dynamically selects the most suitable ones for a given context in real time. This is precisely the goal of the second system – to create the graph needed to generate the next token; it is based on a graph convolutional network. Here lies an important difference from other similar approaches – external knowledge is not static, but is dynamically formed for each position. So, the input data forms the initial nodes of the graph (source entity), then the depth is set and additional nodes (extension entity) associated with the source entity at the given depth are taken from the external knowledge graph. The hidden states of the resulting graph and the semantics obtained from the first system fall into the prediction layer. It produces filtered knowledge in the form of the best nodes along with their connections and a summary, in which triplets from the graph are formulated in the form of simple sentences, taking into account parts of speech and object-subject-relationship roles.

CogNLG was tested on the ENT-DESC datasets, collected from almost 10 million Wikipedia pages and Person and animal from the WikiData knowledge graph. In almost all metrics, it surpassed the best at that time MGCN

Another area where knowledge graphs intersect with neural networks is graph learning. Unlike LLM, where the model learns and works on a specific dictionary, in graph learning there is no such single dictionary – different graphs contain different entities and connections, which, generally speaking, do not intersect. A recent interesting result in this area is the method ULTRA (unified, learnable, and transferable). It is needed to bring different knowledge graphs “to one dictionary”, that is, for example, so that you can train a model on one graph and make it work on another.

In the classical version, representations in the form of graphs are constructed relative to objects. That is, objects are nodes, and the relationships between them are edges. Relations with this approach are invariant. ULTRA takes any such graph and builds on its basis a graph of connections (relations), in which the nodes are relations from the original graph.

A graph neural network is applied to this graph and the output is relative representations for each relationship, that is, they find out how these relationships themselves are related to each other. For example, in the picture above, the input node for the “authored” relationship is the output node for the “genre” relationship. In total, four types of such “meta-connections” (in the sense of connection connections) can be distinguished: input-output, input-input, output-output and output-input. Thus, it is not the relations themselves that become invariant, but their meta-representations. In this form, the revised graph is proposed to be used for any inductive learning methods. The authors screwed ULTRA to NBFNetwhich predicts connections, and showed that by training it on three KGs, it can be translated to more than 50 other graphs.

More of our AI article reviews on the Pro AI channel.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *