The second wind of the relational model


The relational data model has been with us for almost half a century. It is not surprising that during this time she managed to get a little bored. The old dispute between the network and relational models was about to flare up with renewed vigor, but then the unexpected happened.

Story

Let me remind you what was the main advantage of the relational model over the network one. Without much thought, the advantage was that the creators of the relational model intended that you wouldn’t have to code to get the data from the database. As you can see, the idea existed long before low-code and zero-code became popular. We give you SQL, the creators said. And this is almost ordinary human language. There are understandable English words. Well, there are some rules, but they will be easy to learn. And then all the benefits of using databases will become available to everyone.

In fact, it turned out to be difficult for everyone and everyone to learn the rules. For the majority, this “almost” in the phrase “almost ordinary human language” turned out to be an insurmountable obstacle. But the relational model won anyway. Due to the fact that the SQL language still saved programmers a lot of energy. Let zero-code not come out, but low-code turned out to be very good.

As time went on, everyone got used to this low-code in SQL, and partly stopped distinguishing it. At the same time, they began to recall the forgotten advantages of the network model. The concept of NoSQL appeared. At first, No meant… No. In the sense, “well, that’s enough.” Then they nevertheless realized that they got excited and it would not work without SQL at all. Therefore, they began to explain to everyone that No is actually Not only. Be that as it may, the network model began to gradually return the positions it had lost.

Turn

But then a technological miracle happened in the form of large language models and GPT. And it turned out that in the same phrase “almost ordinary human language” now you can safely remove “almost”. Any person, without any preparation, can formulate a query to the database in his usual language and immediately get the result.

How did it happen. Oversimplifying greatly, we can say that the result produced by the language model depends on how much has been previously discussed on your question on the Internet and which of this fell into the training sample. There was a lot of talk about any conceivable query in SQL, and almost all of it fell into the training sample. In principle, the same applies to the most common programming languages. But there is a significant difference. If you ask the language model to write a program, then almost always the result will need to be “finished” a little, otherwise it will not start. If you ask to write a SQL query, then almost always this query can be sent for execution as is.

This phenomenon will certainly be investigated in the future. For now, we just have it as a fact.

Examples

In order not to be unfounded, I will give a few examples.

Explanations for examples

These are all real examples of questions asked by users of a natural language data access service. The results obtained were translated into the 1C:Enterprise query language and executed in this environment. Each question is supplemented in front with a description of the tables. I removed this addition for greater readability. End users don’t see it anyway. The content of the append is obviously inferred from the text of the SQL query. Here I give the texts of SQL queries, but the result is not translated into 1C:Enterprise. It seems to me that it will be clearer to everyone.

I understand that something very simple, such as “show the remnants of such and such a product in the warehouse” does not surprise anyone. So let’s take something more interesting. For example, this question:

What is the best seller this year

language model response

SELECT Product name, SUM(Sales.sum) AS revenue
FROM Sales
JOIN Item ON Sales.item_id = Item.id
WHERE YEAR(Sales.date) = YEAR(CURRENT_DATE())
GROUP BY Product name
ORDER BY revenue DESC
LIMIT 10

It should be noted right away that this request is working. Those. it can be sent as is for execution. What is even more remarkable, we get a working answer to a rather vague question. The language model does not “break” and does not “strike”. She just takes it and gives herself a completely reasonable option.

The language model can “chew” and something more intricate

List of unique buyers of the article Ch-8889 in 2023 from those who had less than 3 purchases of any goods in 2022

With the help of such a question, someone tried to prove the fundamental inoperability of the language model. In fact, there is a certain “ambush” in this issue. Apparently, the opponent himself solved this kind of problem and “stepped on a rake.” He made a query that answered the first part of the condition, and for the second part he used the condition to enter the list. Like, we select unique buyers of the article Ch-8889, and then we look at whether each of these buyers is on the list of those who made less than 3 purchases in 2022. If you have not yet guessed what the matter is, I will tell you that this list will not include those who made 0 purchases and the result will be wrong.

The language model copes with this task without problems. Her answer

SELECT Accounts.id, Accounts.name
FROM Sales
JOIN Accounts ON Sales.account_id = Accounts.id
WHERE Sales.product_id = (SELECT id FROM Product WHERE SKU = ‘W-8889’)
AND YEAR(Sales.date) = 2023
AND Accounts.id NOT IN ( SELECT Sales.account_id FROM Sales WHERE YEAR(Sales.date) = 2022 GROUP BY Sales.account_id HAVING COUNT

>= 3) It’s syntactically correct again. But besides that, he is also correct. The language model was able to “guess” what should be used here NOT IN instead ofIN

.

In general, a neural network can do relatively complex things. For example, to this question

Slice of recent sales by customers

the correct answer will be given
SELECT Sales.account_id, Accounts.name, Sales.item_id, Product.name, Sales.quantity, Sales.amount, Sales.profit amount, Sales.profitability, Sales.date
FROM Sales
JOIN Accounts ON Sales.account_id = Accounts.id
JOIN Item ON Sales.item_id = Item.id

WHERE Sales.date = (SELECT MAX(date) FROM Sales WHERE account_id = Sales.account_id)

And the icing on the cake. All of this works great in reverse as well. I mean that you can ask to build a SQL query from a text description, or you can ask to build a text description from a SQL query. The text description for the request above would be:

This SQL query selects sales data, including account ID, account name, item ID, item name, item sold quantity, sales amount, profit amount, profit margin, and sale date. Three tables are used for this query: Sales, Accounts, and Product. JOIN statements are used to link these tables. The query selects only those records where the sale date is equal to the maximum sale date for this account.

Thus, a person who does not know SQL at all can form a working query to the database. And he can also get a textual description of the request and make sure that he was understood correctly.

Conclusion

The old idea of ​​giving everyone and everyone a tool to work with databases is experiencing a rebirth today. And along with it, relational databases are strengthening their positions. So “chip lay down.” So much has been said over the decades of almost complete dominance of the relational model that now it is not difficult for large language models to correctly solve most issues.

The advent of large language models means that the relational model will be with us for a long time to come. And it seems that the network model will not be able to supplant the relational one. True, the same language models create the conditions for the emergence of another “player”. But that’s a topic for another conversation. At the end of the article I want to recommend you free webinar postgres_fdw where OTUS experts will show you how to set up access between databases through the extension

and logical replication.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *