Table partitioning and compilation time of a query plan in SQL Server

4 min


For prospective students on the course “MS SQL Server Developer” prepared a translation of a useful article.

We also invite everyone to an open webinar on the topic “Polybase: life before and after”… At the webinar, we will look at how it was possible to interact with other databases before Polybase, and how it works now.


I am sometimes asked, “If a table has a lot of indexes and SQL Server is forced to analyze more options, would that slow down building a query plan?”

Well, it may well be, but what really confuses the optimizer is the partitioning. Even the simplest queries on partitioned tables can lead to significant increases in CPU usage. Worse, as the famous philosopher once said: “More partitioning, more problems“.

So let’s take some of Stack Overflow databases and create a partitioning function that will split our data by day:

USE StackOverflow;
GO

/* Create date partition function by day since Stack Overflow's origin,
modified from Microsoft Books Online: 
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-partition-function-transact-sql?view=sql-server-ver15#BKMK_examples
 
DROP PARTITION SCHEME [DatePartitionScheme];
DROP PARTITION FUNCTION [DatePartitionFunction];
*/
DECLARE @DatePartitionFunction nvarchar(max) = 
    N'CREATE PARTITION FUNCTION DatePartitionFunction (datetime) 
    AS RANGE RIGHT FOR VALUES (';  
DECLARE @i datetime="2008-06-01";
WHILE @i <= GETDATE()
BEGIN  
SET @DatePartitionFunction += '''' + CAST(@i as nvarchar(20)) + '''' + N', ';  
SET @i = DATEADD(DAY, 1, @i);  
END  
SET @DatePartitionFunction += '''' + CAST(@i as nvarchar(20))+ '''' + N');';  
EXEC sp_executesql @DatePartitionFunction;  
GO  
 
/* Create matching partition scheme, but put everything in Primary: */
CREATE PARTITION SCHEME DatePartitionScheme  
AS PARTITION DatePartitionFunction  
ALL TO ( [PRIMARY] ); 
GO

Next, let’s create a partitioned copy of the Users table, breaking it down by values ​​in the CreationDate column:

DROP TABLE IF EXISTS dbo.Users_partitioned;
GO
CREATE TABLE [dbo].[Users_partitioned](
	[Id] [int] NOT NULL,
	[AboutMe] [nvarchar](max) NULL,
	[Age] [int] NULL,
	[CreationDate] [datetime] NOT NULL,
	[DisplayName] [nvarchar](40) NOT NULL,
	[DownVotes] [int] NOT NULL,
	[EmailHash] [nvarchar](40) NULL,
	[LastAccessDate] [datetime] NOT NULL,
	[Location] [nvarchar](100) NULL,
	[Reputation] [int] NOT NULL,
	[UpVotes] [int] NOT NULL,
	[Views] [int] NOT NULL,
	[WebsiteUrl] [nvarchar](200) NULL,
	[AccountId] [int] NULL
) ON [PRIMARY];
GO
 
CREATE CLUSTERED INDEX CreationDate_Id ON 
	dbo.Users_partitioned (Id)
	ON DatePartitionScheme(CreationDate);
GO
 
INSERT INTO dbo.Users_partitioned (Id, AboutMe, Age,
	CreationDate, DisplayName, DownVotes, EmailHash,
	LastAccessDate, Location, Reputation, UpVotes,
	Views, WebsiteUrl, AccountId)
SELECT Id, AboutMe, Age,
	CreationDate, DisplayName, DownVotes, EmailHash,
	LastAccessDate, Location, Reputation, UpVotes,
	Views, WebsiteUrl, AccountId
	FROM dbo.Users;
GO
Let’s c

Let’s compare performance with and without partitioning

Let’s create a non-clustered index for the Users and Users_partitioned tables. Please note that even if you do not specify that it is partitioned when creating an index for Users_partitioned, it will still be partitioned by default, which is very nice:

CREATE INDEX DisplayName ON dbo.Users(DisplayName);
CREATE INDEX DisplayName ON dbo.Users_partitioned(DisplayName);

Now let’s run an extremely simple query that returns one line:

SET STATISTICS TIME, IO ON;
SELECT * FROM dbo.Users WHERE DisplayName = N'Brent Ozar';
GO
SELECT * FROM dbo.Users_partitioned WHERE DisplayName = N'Brent Ozar';
GO

Query plans look the same at first glance, but note that the cost of a query for an unpartitioned table is 0% of the total cost, and for a partitioned table it is 100%:

This is because the estimated cost of a query to an unpartitioned table is significantly less than 0.001, and the estimated cost of a query to a partitioned table is more than 15. Worse, the compile time, execution time, and logical reads ) totally different. The screenshot below shows the statistics of an unpartitioned query in the upper part, and a partitioned one at the bottom (extra information has been removed from the output for readability):

And just compiling the plan to the partitioned table took 27 ms of processor time. I know what you’re thinking, “Who cares about 27ms of CPU time?” But remember – we had a very simple request! In real life, it is normal for a plan to take over 250ms of CPU time. This means that you can only compile four queries per second on one processor core. That’s when contamination of the plan cache due to non-parameterized queries really ruins your life.

I was prompted to write this post by one of my clients, who decided to partition all tables regardless of their size while using the same partition granularity everywhere to keep his dynamic ETL queries simple. Unfortunately, even queries against simple configuration tables took over 250ms of CPU time just to compile the plan.

Ok, but what about unpartitioned indexes?

Let’s drop the partitioned index on the Users_partitioned table and create an unpartitioned one. Note that ON PRIMARY must be specified as the filegroup for a partitioned table, otherwise any non-clustered index will also be automatically partitioned by default.

CREATE INDEX DisplayName ON dbo.Users(DisplayName);
CREATE INDEX DisplayName ON dbo.Users_partitioned(DisplayName) ON [PRIMARY];

And let’s execute the requests again:

SET STATISTICS TIME, IO ON;
SELECT * FROM dbo.Users WHERE DisplayName = N'Brent Ozar';
GO
SELECT * FROM dbo.Users_partitioned WHERE DisplayName = N'Brent Ozar';
GO

Now the cost estimates are the same:

But still, this is only an estimate. The only thing that matches here is logical reads:

Compilation and execution of the query plan takes longer if any of the objects involved are partitioned

This overhead is small when compared to the overhead of querying large data stores, where partitioning can reduce the number of reads. But when compared to small objects (for example, rowstore indexes up to 100 GB), which are often accessed with a variety of queries that require building new execution plans, then the overhead of partitioning is already starting to add up. The more partitioned objects, the more sections in each object, the more problems.

Partitioning is one of those possibilities I’ve never heard of: “Wow, no matter what I do, partitioning always improves performance!” Most likely, they will continue to talk about this: “I had no idea that partitioning could create problems THERE.”


Learn more about the course “MS SQL Server Developer”.

Watch an open webinar “Polybase: Life Before and After”.


0 Comments

Leave a Reply