In the IT world, there are many different concepts and approaches that make it easier to design, extend architecture, and create robust products. KISS, DRY, SOLID and other buzzwords are what a programmer should know in order to be considered at least good. But this post will touch upon an already well-known topic – all these approaches are recommendations, not an immaculate law.
In this post, I want to talk about specific situations that I encountered while working as a MS SQL developer.
With the pluses, everything is clear here:
Less code to solve a problem = more functionality made for the same salary.
If a duplicate appears 1 time, then this consequence is not the best approach to architecture, which can lead to the appearance of even more repetitive code.
The more duplicates, the higher the chance of missing another repeat when modifying or debugging a module. Each next meeting with such a piece of code will force you to go through all possible places where you also need to make the same changes.
However, in the bins I have a couple of situations when I had to encroach on the sacred and violate this principle, so that the use of the functionality I made did not bring pain to respected users.
It was necessary to add grouped data for an existing query. I can’t remember all the details, but the essence of the task was such that window functions could not help, so I had to fence a separate
;WITH CTE AS ( SELECT ПолеДляСвязки , SUM(Поле1) AS Сумма, AVG(Поле2) AS Среднее FROM Table1 t1 ...десятки джоинов... ...какие-то группировки... ) SELECT p.Поле1, p.Поле2, .. p.ПолеХ, r.Сумма, r.Среднее FROM Table1 t1 JOIN CTE r ON t1.ПолеДляСвязки = r.ПолеДляСвязки ...десятки джоинов... WHERE t1.ПолеДляФильтрации = 'Какое-то значение'
And then a problem arose – there were tens of millions of records in Table1, and a bunch of additional joins increased the execution time in an additional select to infinity. It is very easy to solve this problem – pass the filter from the main request, which solved the problem more than.
;WITH CTE AS ( SELECT ПолеДляСвязки , SUM(Поле1) AS Сумма, AVG(Поле2) AS Среднее FROM Table1 t1 ...десятки джоинов... ...какие-то группировки... WHERE t1.ПолеДляФильтрации = 'Какое-то значение' ) SELECT p.Поле1, p.Поле2, .. p.ПолеХ, r.Сумма, r.Среднее FROM Table1 t1 JOIN CTE r ON t1.ПолеДляСвязки = r.ПолеДляСвязки ...десятки джоинов... WHERE t1.ПолеДляФильтрации = 'Какое-то значение'
Of course, many will notice that it was possible to use a subquery, then it would work Join push predicate down, and duplicates could have been avoided, but:
It may not work, especially if there are already many different actions in the request, then you will have to duplicate the filter anyway.
CTE has its own meta data, which allows the optimizer to treat this data as a separate table. This can greatly speed up the query.
If unlucky, the optimizer will choose an execution plan such that for each row of the outer query, the subquery will be overrun.
CTE looks more readable and I like it visually, so that’s what I want
Don’t use hints
The developer writes what he wants to get, the optimizer decides how to do it at the data level, everyone knows this anyway. He can see better, and he shouldn’t leave hints.
In principle, it’s hard to disagree with this. Left by another
NOLOCK to speed up, may lead to inaccurate calculations after many years in a new module, and specifying
INNER LOOP JOIN can ruin your life when the number of records in the table cannot be counted on the fingers of two hands. However gentlemen from Microsoft for some reason left us the opportunity to use them? Examples of situations with appropriate prompts can be thought of endlessly, so let’s consider two: where without a prompt it works absolutely incorrectly, and where its absence causes only minor inconveniences
I had a situation at least 2 times (both with my code and with the inherited one), when, after updating the statistics, a new execution plan was generated for the procedure, in which the cursor became a generator of problems.
Suppose we have the simplest possible cursor:
DECLARE OrdersCursor CURSOR FOR SELECT id FROM dbo.Orders WHERE NextOrder IS NULL OPEN OrdersCursor FETCH NEXT FROM OrdersCursor INTO @id WHILE @@FETCH_STATUS = 0 BEGIN EXEC dbo.CreateEmptyNextOrder @parent = @id FETCH NEXT FROM OrdersCursor INTO @id END CLOSE OrdersCursor DEALLOCATE OrdersCursor
An example is peculiar, but simple and straightforward: we have a table with orders, and each order has a subsequent one in the chain. Let’s say we want to take all the current final orders and create continuation for them, but empty, for subsequent filling by clients. If the stars align and you’re lucky, the optimizer will do just that. However, you may not be so lucky. At the time of the first execution, or, even worse, when the saved execution plan is changed on the server, we can get an infinitely executing cursor that will create a new row in the Orders table, add the same row to itself, and create descendants for it. and so on ad infinitum.
The solution is simple – add hint that this cursor
STATIC, and we are only interested in the data at the time of opening, and subsequent updates to the table should not affect the work. The data will be saved in tempdb into a temporary table, which we will go through with the cursor.
DECLARE OrdersCursor CURSOR STATIC --Подсказка серверу FOR SELECT id FROM dbo.Orders WHERE NextOrder IS NULL OPEN OrdersCursor --Момент фиксирования данных FETCH NEXT FROM OrdersCursor INTO @id WHILE @@FETCH_STATUS = 0 BEGIN EXEC dbo.CreateEmptyNextOrder @parent = @id FETCH NEXT FROM OrdersCursor INTO @id END CLOSE OrdersCursor DEALLOCATE OrdersCursor
The optimizer, for reasons known only to it, refuses to use the required index – not a common problem, but the stick shoots too. Here you need to be careful and start from a specific situation. Specifically, I had a separate procedure, inside which the optimizer persistently carried out
INDEX SCAN the clustered key, ignoring the index I created. The table was not too big, so scanning + other functionality was performed in 1-2 seconds, but why not bring it to a couple of milliseconds by adding just one hint?
Business logic over adequacy
Not really about the development methodology, but rather about situations when users want more and more convenience, and the management operates with the logic “well, if they ask, then it must be done.”
The situation is as follows – the goods arrive at the warehouse, the employees accept it, create invoices for each batch, start the processing of this document (inside this processing there is a bunch of business logic, each part of which is visible to a specific department, but only the generation of barcodes is interesting in the warehouse), print a barcode -codes, stick and put in place. It seemed to be a simple and streamlined process, everything was going well until the workers thought “even takes a long time to process.”
To be honest, the processing was not very long, if we take the volume of the invoice in several tens of lines, only a few seconds. But when the count went to hundreds or thousands of records, it could grow up to 3 minutes. On our advice “to create invoices in a smaller volume, because there is no difference” we only heard “we want it this way.”
There is nothing to do, I had to remove the general transaction, replacing it with row-by-row transactions. After that, users could interrupt the processing of the invoice at any time, glue stickers on the processed products and put them on.
But then another problem arose – at the beginning of processing, the status of the invoice changes to “in progress”, which prohibits other people from touching this document. Now this functionality also needs to be somehow taken into account in the redesigned module, adding the ability to change the status back in case of cancellation by the client. Here the ability to create nested transactions came to the rescue.
If an error occurs in an internal transaction (this could be due to internal checks) or the procedure was interrupted during its execution, we roll back to the savepoint
We confirm the external transaction at the time of the save point + change the status back
Of course, the users were satisfied, and we added a bunch of crutches to our system and the need to remember that the procedure can be interrupted at any time, and some parts of it should be rolled back, and which ones should be saved in any case.
These are just a couple of examples confirming the need at least sometimes to make curved systems or write code not according to a template, as the business asks for. I still have a few stories when I had to make a deal with my inner sissy and do “the right thing”, and if this post is not heavily showered with slippers, then I will write a sequel.