Why FAST_FORWARD drastically improves cursor performance

Beginning developers often use cursors with default options. This continues until one of the senior developers or database administrators suggests that with the parameter FAST_FORWARD everything will work much faster. You may have researched this yourself and read Aaron Bertrand’s post about cursor performance benchmark with various parameters. I confess for years I didn’t care why FAST_FORWARD sometimes sped up my requests. The name had the word “FAST” in it, and that was enough for me.

But I recently ran into a problem in production where using the correct cursor options resulted in a 1000x performance improvement. I decided that ten years of ignorance was enough and finally did some research on different cursor options. This post contains a reproduction and discussion of the problem that occurred.

Problem reproduction

The code below creates a 16 GB table. Column ID defined as a primary key, also known as a clustered index. On a column ID2 nonclustered index created. You can decrease the value TOP to reduce the number of rows and create a smaller table, but don’t make it smaller than 200,000.

DROP TABLE IF EXISTS tbl_1;
CREATE TABLE tbl_1 (
ID BIGINT NOT NULL,
ID2 BIGINT NOT NULL,
PAGE_FILLER VARCHAR(5000) NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO tbl_1 WITH (TABLOCK)
SELECT RN, RN % 100000, REPLICATE('Z', 5000)
FROM
(
    SELECT TOP (2000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) q
OPTION (MAXDOP 1);
CREATE INDEX IX2 ON tbl_1 (ID2);

A simplified example of a problematic SELECT‘and from production it looks like this:

Select ID
from tbl_1
WHERE ID2 < 1
ORDER BY ID;

The filter is very selective: only 20 rows out of 2 million fall under the condition. I expect the query optimizer to use the index on ID2 and the request will be executed instantly. The following stored procedure defines a cursor FAST_FORWARDwhich takes those 20 rows and simply iterates over them without doing anything else:

CREATE OR ALTER PROCEDURE CURSOR_WITH_FF
AS
BEGIN
SET NOCOUNT ON;
Declare @ID BIGINT;
Declare FF Cursor FAST_FORWARD for
Select ID
from tbl_1
WHERE ID2 < 1
ORDER BY ID;
Open FF;
Fetch Next from FF into @ID;
WHILE @@FETCH_STATUS = 0
BEGIN
    Fetch Next from FF into @ID;
END;
Close FF;
Deallocate FF;
END;

On my machine, this stored procedure takes about 2ms to execute. The query plan in this case is very similar to the plan for a simple SELECT. Although there is some additional garbage from the cursor, but in this case it does not affect performance:

Unfortunately, if you remove the parameter FAST_FORWARD, then this code will be executed for 50 seconds. What caused such a huge difference?

What’s happening

Let’s start with the query execution plan for the default cursor:

The query optimizer decided to execute Clustered Index Scan instead of using an index IX2. We are getting all the rows from the cursor, so we have to read the whole table. That’s 20 scans, each covering about 5% of the table. Definitely to be expected that this is a less efficient plan with FAST_FORWARD. But 50 seconds is too slow. Let’s look at the statistics of expectations:

Unsurprisingly, I/O contributes the most to wait time (how could it not?). But why so many? I have fast local storage with an average latency of less than 1ms. For comparison, I decided to try to force the query plan selected by the cursor, but run it outside the cursor. The following query ran on my machine in about 8 seconds:

CHECKPOINT;
DBCC DROPCLEANBUFFERS;
Select ID
from tbl_1 WITH (INDEX(1))
WHERE ID2 < 1
ORDER BY ID
OPTION (MAXDOP 1);

Using sys.dm_io_virtual_file_stats, I found that the cursor was doing about 240,000 I/Os at an average of 66 KB per operation. Single SELECT performed about 10,000 I/Os with an average I/O size of 1.7 MB. The key difference is that only the first execution of a query with a cursor can perform read-ahead reads:

In the case of a cursor, read-ahead reads are not performed for 95% of the I/O required to complete a query. Even an I/O latency of less than a millisecond can be painful when you have 240,000 I/Os to perform. It turns out that for an efficient search of 20 lines, the cursor with FAST_FORWARD able to use the index. A default cursor performs about 15 GB of I/O that cannot be used for read ahead.

When using clouds, the situation will, of course, be much worse. With a delay of 5-10ms for general purpose managed instances, you can expect a default cursor to run for 20 to 40 minutes. For fun, I decided to test this on a managed instance with 4 vCores. Cursor with parameter FAST_FORWARD ran for about 120 ms. With default settings, about 70 minutes. Here are the execution wait statistics:

On the positive side, using the correct cursor options improved managed instance performance by a factor of 35,000.

Why is this happening

I’ll start by saying that I don’t want to be an expert in using cursors. I’d rather be an expert on opting out of using them. I’ve often found it difficult to explain why the default cursor has such a bad query plan, but luckily I found an article from 12 years ago. I’ll quote the entire section on dynamic plans because you never know when a Microsoft blog post will disappear:

“A dynamic plan can be processed incrementally. To do this, in SQL Server, we serialize the execution state of a query into what we call a “token”. Later, we can build a new query execution tree using the marker to reposition the statements. Moreover, the dynamic plan can move forward and backward relative to its current position. Dynamic plans are used by both dynamic and some fast_forward cursors.

A dynamic plan consists only of dynamic statements, statements that support markers and forward and backward movement. This corresponds, although not exactly, to the processing of requests by stream operators (as opposed to stop-and-go). But not all stream operators are dynamic. In SQL Server, dynamic means:

1. The operator can be repositioned to the current position using a marker or to a relative position (next or previous) in relation to the current one.

2. The state of the operator must be small for the marker to be small. The statement cannot store row data. In particular, there is no sort table (sort table), hash table (hash table) and work table (work table). You cannot save even one line, because one line can be very large.

Without a dynamic plan, the cursor would need temporary storage to store the result set of the query (or its set of keys). The dynamic plan does nothing of the sort! However, some operators are disqualified, such as hash join, hash agg, compute sequence, and sort. This leads to suboptimal plans.”

In short, you can think of a dynamic plan as similar to a plan without blocking statements, but with some additional restrictions. I have learned from reliable sources that a cursor with default parameters will always select a dynamic plan if one is available. For our SELECT dynamic plan is indeed available. You can use an ordered clustered index to get sorted rows without performing an explicit sort. Index IX2 can’t be used for sorting because I’m filtering by condition with inequality by ID2. Changing the condition to equality will allow the use of a dynamic plan with an index IX2:

What about other cursor options? If we return to the original request, then specifying the parameter STATIC or KEYSET avoids bad query plan and use index IX2 With Index Seek. Both of these options write a set of cursor data to a database table tempdbso it’s intuitive that there won’t be any constraint causing Clustered Index Scan.

Specifying a parameter FAST_FORWARD allows the query optimizer to choose between a static and a dynamic plan. In this case, the static plan is obviously much more efficient, and the query optimizer knows this. It chooses a static plan that does not Clustered Index Scan.

For the sake of completeness, specifying the parameter READ_ONLY also leads to Index Seekif the parameter is not specified DYNAMIC.

In general, I would say that FAST_FORWARD is still a good starting point for your cursors, as long as the limitations associated with it are acceptable in your application. For performance similar to conventional SELECTone FAST_FORWARD is not always enough. How notes Eric, when using this option, you will get a prompt with MAXDOP 1. The query optimizer can also choose a worse dynamic plan over a static plan if the estimated cost of the query plan is not true. Usage STATIC instead of FAST_FORWARD can be quite useful in some cases, but to know for sure you need to test your cursor. Of course, you can write code without using a cursor.

Reducing access rights

I am constantly looking for interesting examples and drew attention to the following point in documentation:

If in DECLARE CURSOR do not specify READ_ONLY, OPTIMISTIC or SCROLL_LOCKSthen the default values ​​look like this:

If the instruction SELECT does not support updates (insufficient permissions, access to remote tables that do not support updates, etc.), then the cursor becomes READ_ONLY.

Will I be able to improve performance by running the code as a user with less privileges? Unfortunately, I wasn’t able to reproduce it. I created a login ‘erik’ that can’t modify data in the table but got the same query plan as before. Also there was no change in the column properties v sys.dm_exec_cursors.

I don’t know if this is a mistake in the documentation or I did something wrong on my side. It would be interesting to see a working example of this.

Conclusion

Some queries may have extremely poor performance with the default cursor options. You can identify these queries by sorting them by total logical reads in the standard Query Store reports in SSMS. In our production environment, we had a few cursors that did a lot more logical reads than everyone else, so it wasn’t hard to spot them. We have significantly speeded up some of them by simply adding the parameter FAST_FORWARD.


All interested are invited tomorrow at 20:00 to open lesson OTUS on the topic “Using and optimizing stored procedures in MS SQL Server”. At the lesson, we will consider the eternal question: “where is the logic?” In the application or in the database? Let’s talk about the usefulness of stored procedures and how to optimize their performance in MS SQL Server.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *