T-SQL coverage monitoring

Content

TLDR

  • Add a child sqlproj with tSQLt tests.

  • Assembled PR We build both projects, get dakpacks taking into account changes from the current branch.

  • We create a database clone on the fly and deploy the project dakpack with tests into this clone.

  • We run tests and collect coverage through Extended Events using the SqlCover utility.

  • We convert the received data into formats for TeamCity and SonarQube.


Building CI/CD for databases is not a trivial task in any case. If the database project contains nothing but tables, then, in principle, we could finish with the implementation of sqlpackage. But if code for stores, triggers, and functions is written in the database, then I would like to test this code before publishing it for production. In this article I would like to share some details of implementing unit tests in the pipeline for databases with T-SQL code. We really run tests when assembling each pull request in the sqlproj repositories, but not everything is so simple.

Project preparation

In the world of T-SQL there are no options, there is only one living framework – tSQLt. Even the guys from Microsoft are already open to it refer in conversations about a framework made under SSDT. That's why we focus on tSQLt.

A successful organization within a solution is very similar to how it is done in .NET and other languages ​​and frameworks: there is a main project and an additional, child one, which with tests looks at the main project.

Thus, we create two solutions sqlproj. To prevent the code with tests from turning into a meaningless independent database, we establish the relationship with the main project as Same Database. This solution achieves two goals:

Window for adding a new connection between databases in the SSDT project. Pay attention to the choice of value in the Database location field.

Window for adding a new connection between databases in the SSDT project. Pay attention to the choice of value in the Database location field.

You need to connect the tSQLt framework to the same project with tests. The framework will be needed in all projects with tests. And, most likely, you will make improvements to the framework code. Therefore, I recommend forking the source code, formatting it as the same local SSDT project as all other databases, building it in dacpac and sharing it among all your TSQL projects. The framework code, obviously, should be in the same database as the tests. We connect it similarly to the main project, with the option Same Databasebut not directly, as a project with sources, but in the form of a dacpac file.

This is what dependencies in a project with tests should look like. On the right is a fragment of the AdventureWorsTest.sqlproj file

This is what dependencies in a project with tests should look like. On the right is a fragment of the AdventureWorsTest.sqlproj file

After assembling the project with tests, you will get dakpakcontaining everything:

This rich dakpak will be deployed to the circuits where these tests are supposed to be performed.

To understand whether the main project is in a state capable of deployment, it is better to first deploy its own dakpack. And only then, on top of the first deployment, deploy the dakpack with tests. Otherwise, it will turn out that a dakpack will arrive at the product, which no one has ever tried to deploy – no one needs such surprises.

Isolation Testing

TSQL code is noticeably different from code in JS, .NET, Python and many other languages ​​in that it can only be executed on the DBMS side. That is, first the code needs to be deployed somewhere. And testing, obviously, is only possible in a working database instance. It would be possible to speculate about raising an instance on the fly in a conditional Docker container, if the traditions of TSQL coding were not for cutting the system into many databases, and then working with all these databases in a row, in the opposite direction and diagonally from one storage. A database with such noodles in isolation cannot be deployed to an empty instance. She needs all the familiar surroundings. All connections must be resolved.

In such circumstances, running tests is possible on a pre-prepared server with a bunch of databases that make up a full-fledged system under test. Here, those who want to do something are faced with the fact that their actions have an impact on other users and developers. And they themselves regularly observe the effect of the presence and activity of other people in the database. If we return to the original desire to test pull requests in parallel, it becomes clear that different programmers can influence not only the data, but also the code, the structure of the database. And these changes cannot coexist simultaneously in the same database.

In search of a solution to the described problem, we came to database cloning. It's about team DBCC CLONEDATABASE. Cloning is very fast even for large databases. This mechanism has a number of nuances, but overall it performed well. Creating a database on the fly from scratch was also considered, but sqlproj projects do not have many properties and settings characteristic of a decent database blank created under the control of a DBA, and from scratch from a dakpack it is difficult to roll out something quite similar to the same database from the market. But who knows, maybe we’ll return to discussing this option.

Regarding the data: writing tests based on what is stored here and now in a database taken from sale or even in a sandbox is naive and futile. We found a suitable client (albeit an impersonal one), with the required tariff, balance, orders, and adjusted a hardcoded test for him client_id. Tomorrow the client is no longer a client or has switched to a different tariff, or the product from the order has changed category – and the test has fallen apart. A reliable test does not depend on such circumstances. A decent tSQLt test first fakes dependencies, inserts the minimum required set of data into fake tables, turns “extra” stores and functions into fake dummies, and checks only what it is intended for. Here clones again turn out to be a good solution: the tables in the clone are pristinely empty and there is simply nothing for the naive version of the test to work on.

However, preparing an isolated blank, in which it will be possible to run tests for many pull requests in parallel, does not end with cloning. The scenario is broadly like this:

- Создать клон
    - Задеплоить проект с тестами
        - Разрешить доступ сборке tSQLt в клон
            - Стартовать сбор покрытия
                - Обнаружить список тест-классов
                    - Выполнить отдельный тест или тест-класс
                        - Повторно выполнить упавшие тесты

Collecting results

Coverage can be collected by utility SqlCover from Ed Elliott. The mechanics are as follows: before executing tests, a collection session is created Extended Eventswhich tracks the well-known SP:StmtStarting. In the details of each such event there is object_id and line number. If you parse the sources of storage and other programmabilities, you will be able to compare events with the sources and understand which expressions were executed during the tests.

Level of detail up to statement – so-so. Imagine a kilometer-long request with a bunch of CTEs that go into MERGE with several Actions, from there, let’s say OUTPUT DELETED.* INTO @tbl. After all, this is one expression. AND SP:StmtStarting will only tell us that we have entered this expression. Did a specific CTE work, did it not work, under which Action merge we failed, was there at least something in OUTPUT DELETED — we won’t know these details. Not to mention which branch CASE WHEN THEN ELSE the code worked, and some didn’t. It's a shame. So much effort and such clumsy statistics. But still, the presence of tests and even a similar understanding of the level of code coverage by tests is much better than their absence.

The result of running tests from one pull request in the TeamCity interface. Many tests worked, one failed.

The result of running tests from one pull request in the TeamCity interface. Many tests worked, one failed.

In addition to coverage, you need to understand which test failed and which worked. It is very desirable to know the specific error that occurred; It would also be nice to see the time it took for each test to run. So pull tSQLt.Run and you won't be able to relax. Information from the framework should be downloaded and converted into formats for C.I.-runner (in our case TeamCity) and for sonar.

SonarQube monitors the coverage of code changes with tests and copy-paste

SonarQube monitors the coverage of code changes with tests and copy-paste

The script step mentioned above about restarting is also a custom solution. Test failures may be due to a deadlock, timeout, and similar nonsense, which is not necessarily related to the code of the test or module being tested. After such an error, it makes sense to run the test again within the same build on CI. Running the build again for every timeout or deadlock means you’ll be tired of waiting for a successful set of circumstances. But, I repeat, in order to understand the reason for the test failure, you must first take the results from the sequel and analyze them.

Refinement of tools

Tools were mentioned above, which, it seems, were created specifically to solve the tasks, all that remains is to take the ready-made one and screw it to the pipeline… But there are nuances.

SqlCover

#1 Here in this place The brakes are terrible:

  The set of batches can be huge, and scanning it in every iteration is not the best approach

The set of batches can be huge, and scanning it in every iteration is not the best approach

We loop through the expressions that we parsed from the code and in the so-called batches extracted from the collected Extended Events we look for whether our object was there. Nested Loops x Table Scan. I had to look for this place when the test execution time on a relatively large project exceeded an hour. A quick inspection revealed that without Coverage collection the same tests were completed in fifteen minutes. One of the suspicious things was found, and this turned out to be the case. Large project and many tests – large size of the trace and, accordingly, the collection _batches. Treated with blunt repositioning _batches V Dictionary with key ObjectId.

#2 The tracefile will be large in any case and you need to allocate space for it by explicitly specifying some gigabytes in the EVENT SESSION definition. If not explicitly specified, the trace will be cut to the default size of 1GB. For a long time we could not understand why coverage was not growing, although new tests were being actively written. It turned out that 60GB+ from the trace were simply not saved anywhere, since we did not allocate space for them. The discovered actual volume of tracing also became a reason to look for options to reduce it. The tracing session is started by SqlCover and the definition of the session is in the source code of this utility, so it’s all the same. The most obvious way to reduce the trace is to immediately exclude tSQLt framework objects from the definition. Attempts to improve the quality of tracing session detection will pay off: these volumes fall on the disk of your own server, the time for collecting data and calculating test coverage is included during your own builds.

#3 If there are triggers in the databases and you want to measure their coverage, then you will have to figure out trigger tracking yourself. At the tables after the fake object_id changes and tSQLt.ApplyTrigger glues triggers are already on a table that did not exist before the tests began and will not exist after. And triggers are not transferred from one table to another, that is, a new temporary object will be created. SqlCover such an identifier in sys.objects won't find it. It will be necessary to compare the events in the trace by a temporary trigger on the fake table with a real trigger on the original table.

The author of the utility is, of course, a great fellow, and the tool is unique and, undoubtedly, useful. But the code is crude and unkempt. There is room for minor optimization, and in general it would be possible to go through all the source code with an experienced eye. Screw StyleCopadd tests.

tSQLt

#1 If the databases are actively looking at each other, then you need to manually add synonym stores. Transform remote tables, functions, and storages from local synonyms into local full-fledged fake objects.

#2 If the sequel version allows it, then it is advisable to rewrite the entire string aggregation with FOR XML on STRING_CONCAT. Working with XML is no different in speed; this will be noticeable in terms of volume. The framework collects dynamic queries from metadata, so there is a lot of string aggregation in the code.

#3 If the test author forgot to fake a large table and executed tSQLt.AssertEqualsTable to compare the result with a small test set of rows, the storage will begin to display a delta, which will amount to as many millions of rows as were in the original table, which was not faked in time. Nobody needs these few hours of pointless spam – just add a limit. At random we chose for this tSQLt.Private_Print. No functionality is expected from the framework thousands of printsso they got stuck on all the variants of manifestation of the described scenario at once.

I, he says, am a simple person: as much as they give, I’ll print as much

I, he says, am a simple person: as much as they give, I’ll print as much

#4 Preparing a decent test with fake dependencies takes a lot of effort. It is advisable to implement a repository that generates a test blank with fake dependencies. The dependencies of the repository under test can be extracted from sys.sql_expression_dependencies.

#5 In several places in the framework, tests put something into permanent tables, and some of this information is lost after the transaction with the test session is rolled back. This information is mostly meaningless outside of a specific test run. And with the active parallel work of several sessions, the tests even begin to overlap in the tables, clean up or read something that is not yours. Approximately all such places need to be cut out, persistent tables replaced with temporary ones. Although the pipeline runs tests inside clones, in the original version of the database on the sandbox server, programmers do the debugging themselves. Therefore, it is certainly worth eliminating the inappropriate overlap of tests on shared resources.

The authors of the framework are, of course, great fellows, and the tool is unique and, undoubtedly, useful. But the code is crude and unkempt. It turned out strange that the guys are aware of the tests, but have not heard about formatters and linting. It is highly advisable to go through the code with an experienced eye and a confident hand.

You will have to work quite actively with the “file”.

Pitfalls

  • There are a lot of nuances associated with database clones, including bugs like this one. Bug DacFxwhich depends on the SqlSever bug.

  • Before creating crowds of clones, you need to reach an agreement with the DBA. And program a job that will clean up abandoned scraps from interrupted sessions.

  • Programmers need the ability to run tests not only on the CI side, but also during development. This may require a separate server. And a period of getting used to the fact that sometimes developers overlap with each other in storage and tables.

  • Coverage through Extended Events will not be able to provide the same details as in “normal” programming languages. There will be no line-by-line detail, only the level of individual expressions.

  • A lot of time will have to be spent on finalizing SqlCover and tSQLt.

  • First you need to build a CI/CD for the database, at least without tests.

  • Everything around the SqlCover, tSQLt, SqlPackage calls are custom solutions. This will need to be written by hand. Kilometers of scripts.


In any case, the game is worth the candle. The TSQL development process with pipeline tests and sonar is much more mature and decent than without them. This will clearly become clear even to skeptics when the test fails for the first time.

Links

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *