comparing data compression and backup compression
MSSQL supports on-the-fly compression of backups – lightweight and fast. You can also pack data inside the database using DATA_COMPRESSION = PAGE or ROW. As we remember, packed data is poorly packed. How will data packaging affect the backup size?
Test base
We create a large table test in a clean database (don’t forget to convert it to SIMPLE):
create table seed (n int)
GO
set nocount on
declare @n int=9999 while @n>=0 begin
insert into seed select @n
set @n=@n-1
end
GO
create table test (n int identity, a int, b int, str varchar(128))
GO
create clustered index PK_n on test (n)
GO
insert into test (a,b,str)
select L.n,R.n,'seed '+convert(varchar,L.n)+' and '+convert(varchar,R.n)
from seed L, seed R
GO
create index IXa on test (a)
GO
create index IXb on test (b)
GO
create index IXstr on test (str)
GO
Let's write down the sizes of indexes and backups – packed and regular. Then apply to all indexes DATA_COMPRESSION = PAGE
alter index PK_n on test REBUILD with (data_compression=PAGE)
alter index IXa on test REBUILD with (data_compression=PAGE)
alter index IXb on test REBUILD with (data_compression=PAGE)
alter index IXstr on test REBUILD with (data_compression=PAGE)
and we'll write everything down again.
Finally, apply DATA_COMPRESSION = ROW
alter index PK_n on test REBUILD with (data_compression=ROW)
alter index IXa on test REBUILD with (data_compression=ROW)
alter index IXb on test REBUILD with (data_compression=ROW)
alter index IXstr on test REBUILD with (data_compression=ROW)
results
All dimensions are given in Mb
Index | no compression | compression PAGE | compression ROW |
PK_n | 4355 | 1992 | 3646 |
IXa | 1355 | 854 | 1152 |
IXb | 1355 | 963 | 1152 |
IXstr | 3091 | 1239 | 3174 |
Now consider the size of full backups, also in megabytes:
Backup | no compression | compression PAGE | compression ROW |
no compression | 10417 | 5189 | 9364 |
compression | 1934 | 2331 | 2285 |
conclusions
I haven’t discovered America, the results are quite expected, but I wanted to make sure again
Packaging a backup always reduces its size, regardless of the type of packaging of the data inside, so this is useful.
As expected, the backup decreases the most if the data inside the database is not compressed (up to 5 times). In reality, this coefficient is usually 3-4
compression=PAGE greatly reduces the amount of data and increases the cache hits ratio. Obviously, at the cost of increased CPU
compression=ROW works worse and sometimes even increases the space occupied by data.