When “to do badly” == “to do better”

In the IT world, there are many different concepts and approaches that make it easier to design, extend architecture, and create robust products. KISS, DRY, SOLID and other buzzwords are what a programmer should know in order to be considered at least good. But this post will touch upon an already well-known topic – all these approaches are recommendations, not an immaculate law.

OOP opponents, when another post comes out about the cons of patterns and development approaches
OOP opponents, when another post comes out about the cons of patterns and development approaches

In this post, I want to talk about specific situations that I encountered while working as a MS SQL developer.

DRY

With the pluses, everything is clear here:

  • Less code to solve a problem = more functionality made for the same salary.

  • If a duplicate appears 1 time, then this consequence is not the best approach to architecture, which can lead to the appearance of even more repetitive code.

  • The more duplicates, the higher the chance of missing another repeat when modifying or debugging a module. Each next meeting with such a piece of code will force you to go through all possible places where you also need to make the same changes.

However, in the bins I have a couple of situations when I had to encroach on the sacred and violate this principle, so that the use of the functionality I made did not bring pain to respected users.

Duplicate filter

It was necessary to add grouped data for an existing query. I can’t remember all the details, but the essence of the task was such that window functions could not help, so I had to fence a separate SELECT

;WITH CTE AS (
	SELECT ПолеДляСвязки , SUM(Поле1) AS Сумма, AVG(Поле2) AS Среднее
  FROM Table1 t1
  ...десятки джоинов...
  ...какие-то группировки...
)
SELECT p.Поле1, p.Поле2, .. p.ПолеХ, r.Сумма, r.Среднее
FROM Table1 t1
JOIN CTE r ON t1.ПолеДляСвязки = r.ПолеДляСвязки
...десятки джоинов...
WHERE t1.ПолеДляФильтрации = 'Какое-то значение'

And then a problem arose – there were tens of millions of records in Table1, and a bunch of additional joins increased the execution time in an additional select to infinity. It is very easy to solve this problem – pass the filter from the main request, which solved the problem more than.

;WITH CTE AS (
  SELECT ПолеДляСвязки , SUM(Поле1) AS Сумма, AVG(Поле2) AS Среднее
  FROM Table1 t1
  ...десятки джоинов...
  ...какие-то группировки...
  WHERE t1.ПолеДляФильтрации = 'Какое-то значение'
)
SELECT p.Поле1, p.Поле2, .. p.ПолеХ, r.Сумма, r.Среднее
FROM Table1 t1
JOIN CTE r ON t1.ПолеДляСвязки = r.ПолеДляСвязки
...десятки джоинов...
WHERE t1.ПолеДляФильтрации = 'Какое-то значение'

Of course, many will notice that it was possible to use a subquery, then it would work Join push predicate down, and duplicates could have been avoided, but:

  • It may not work, especially if there are already many different actions in the request, then you will have to duplicate the filter anyway.

  • CTE has its own meta data, which allows the optimizer to treat this data as a separate table. This can greatly speed up the query.

  • If unlucky, the optimizer will choose an execution plan such that for each row of the outer query, the subquery will be overrun.

  • CTE looks more readable and I like it visually, so that’s what I want

Don’t use hints

The developer writes what he wants to get, the optimizer decides how to do it at the data level, everyone knows this anyway. He can see better, and he shouldn’t leave hints.

If the optimizer was a real person
If the optimizer was a real person

In principle, it’s hard to disagree with this. Left by another NOLOCK to speed up, may lead to inaccurate calculations after many years in a new module, and specifying INNER LOOP JOIN can ruin your life when the number of records in the table cannot be counted on the fingers of two hands. However gentlemen from Microsoft for some reason left us the opportunity to use them? Examples of situations with appropriate prompts can be thought of endlessly, so let’s consider two: where without a prompt it works absolutely incorrectly, and where its absence causes only minor inconveniences

CURSOR HINTS

I had a situation at least 2 times (both with my code and with the inherited one), when, after updating the statistics, a new execution plan was generated for the procedure, in which the cursor became a generator of problems.

Suppose we have the simplest possible cursor:

DECLARE OrdersCursor CURSOR FOR
SELECT id  
FROM dbo.Orders 
WHERE NextOrder IS NULL  

OPEN OrdersCursor   
FETCH NEXT FROM OrdersCursor INTO @id    
WHILE @@FETCH_STATUS = 0   
	BEGIN         
	
	EXEC dbo.CreateEmptyNextOrder @parent = @id
	
	FETCH NEXT FROM OrdersCursor INTO @id  
END   
CLOSE OrdersCursor   
DEALLOCATE OrdersCursor 

An example is peculiar, but simple and straightforward: we have a table with orders, and each order has a subsequent one in the chain. Let’s say we want to take all the current final orders and create continuation for them, but empty, for subsequent filling by clients. If the stars align and you’re lucky, the optimizer will do just that. However, you may not be so lucky. At the time of the first execution, or, even worse, when the saved execution plan is changed on the server, we can get an infinitely executing cursor that will create a new row in the Orders table, add the same row to itself, and create descendants for it. and so on ad infinitum.

The solution is simple – add hint that this cursor STATIC, and we are only interested in the data at the time of opening, and subsequent updates to the table should not affect the work. The data will be saved in tempdb into a temporary table, which we will go through with the cursor.

DECLARE OrdersCursor CURSOR 
STATIC --Подсказка серверу
FOR
SELECT id  
FROM dbo.Orders 
WHERE NextOrder IS NULL  

OPEN OrdersCursor --Момент фиксирования данных
FETCH NEXT FROM OrdersCursor INTO @id    
WHILE @@FETCH_STATUS = 0   
	BEGIN         
	
	EXEC dbo.CreateEmptyNextOrder @parent = @id
	
	FETCH NEXT FROM OrdersCursor INTO @id  
END   
CLOSE OrdersCursor   
DEALLOCATE OrdersCursor 

WITH (INDEX)

The optimizer, for reasons known only to it, refuses to use the required index – not a common problem, but the stick shoots too. Here you need to be careful and start from a specific situation. Specifically, I had a separate procedure, inside which the optimizer persistently carried out INDEX SCAN the clustered key, ignoring the index I created. The table was not too big, so scanning + other functionality was performed in 1-2 seconds, but why not bring it to a couple of milliseconds by adding just one hint?

Business logic over adequacy

Not really about the development methodology, but rather about situations when users want more and more convenience, and the management operates with the logic “well, if they ask, then it must be done.”

There have been such things, I am not joking or exaggerating
There have been such things, I am not joking or exaggerating

The situation is as follows – the goods arrive at the warehouse, the employees accept it, create invoices for each batch, start the processing of this document (inside this processing there is a bunch of business logic, each part of which is visible to a specific department, but only the generation of barcodes is interesting in the warehouse), print a barcode -codes, stick and put in place. It seemed to be a simple and streamlined process, everything was going well until the workers thought “even takes a long time to process.”

To be honest, the processing was not very long, if we take the volume of the invoice in several tens of lines, only a few seconds. But when the count went to hundreds or thousands of records, it could grow up to 3 minutes. On our advice “to create invoices in a smaller volume, because there is no difference” we only heard “we want it this way.”

There is nothing to do, I had to remove the general transaction, replacing it with row-by-row transactions. After that, users could interrupt the processing of the invoice at any time, glue stickers on the processed products and put them on.

It was and it is
It was and it is

But then another problem arose – at the beginning of processing, the status of the invoice changes to “in progress”, which prohibits other people from touching this document. Now this functionality also needs to be somehow taken into account in the redesigned module, adding the ability to change the status back in case of cancellation by the client. Here the ability to create nested transactions came to the rescue.

  • If an error occurs in an internal transaction (this could be due to internal checks) or the procedure was interrupted during its execution, we roll back to the savepoint

  • We confirm the external transaction at the time of the save point + change the status back

Of course, the users were satisfied, and we added a bunch of crutches to our system and the need to remember that the procedure can be interrupted at any time, and some parts of it should be rolled back, and which ones should be saved in any case.

Conclusion

These are just a couple of examples confirming the need at least sometimes to make curved systems or write code not according to a template, as the business asks for. I still have a few stories when I had to make a deal with my inner sissy and do “the right thing”, and if this post is not heavily showered with slippers, then I will write a sequel.

Similar Posts

Leave a Reply