slow Index Scan

In my work, when I have to research and load a DBMS with atypical loads and synthetic tests, I often encounter cases of mysterious system behavior: acceleration/slowdown of query execution time by a couple of orders of magnitude, refusal to use a particular index, etc. The explanation for strange behavior ends up being almost always trivial and well known to an experienced DBA. However, when you encounter it in real use for the first time, you involuntarily get lost and it takes a lot of time to analyze the case. At the same time, this is a rather interesting exercise – like how to solve a problem book on aerodynamics after 10 years of designing airplane gliders. Therefore, I propose here to try the format of discussing/studying PostgreSQL in the form of tasks. Will he come in suddenly?

So, introductory notes for the first, simplest task.. We have a table and a pair of multicolumn indexes that differ from each other only in the order of the columns:

CREATE TEMP TABLE shopping (
  CustomerId bigint, CategoryId numeric, WeekDay text, Total money
);
INSERT INTO shopping (CustomerId, CategoryId, WeekDay, Total)
  SELECT gs % 1E5, gs % 100, 'Day' || (gs % 7), random()*1000::money
  FROM generate_series(1,1E6) AS gs;
CREATE INDEX idx1 ON shopping (CustomerId, CategoryId, WeekDay);
CREATE INDEX idx2 ON shopping (WeekDay, CategoryId, CustomerId);
VACUUM ANALYZE shopping;

If you execute a request for a simple scan by index with entry into Heap for the ' fieldTotal', then the query execution time may differ significantly, depending on which index is selected:

EXPLAIN (ANALYZE, TIMING OFF, COSTS ON)
SELECT customerid, categoryid, weekday, total FROM shopping
ORDER BY CustomerId, CategoryId, WeekDay;
/*
 Index Scan using idx1 on shopping  
 (cost=0.42..64093.93 rows=1000000 width=25) (actual rows=1000000 loops=1)
 Planning Time: 0.108 ms
 Execution Time: 377.762 ms
*/

EXPLAIN (ANALYZE, TIMING OFF, COSTS ON)
SELECT customerid, categoryid, weekday, total FROM shopping
ORDER BY WeekDay, CategoryId, CustomerId;
/*
 Index Scan using idx2 on shopping  
 (cost=0.42..63963.29 rows=1000000 width=25) (actual rows=1000000 loops=1)
 Planning Time: 0.127 ms
 Execution Time: 3700.215 ms
 */

The actual task: What is the reason for this slowdown? What did PostgreSQL do wrong? After all, the cost of a query plan is estimated by the planner to be approximately the same for both cases?

If there are ideas/projects/patches in the hackers mailing list related to this behavior, I suggest discussing them in the comments. It is also extremely interesting to know whether a similar case is observed in Oracle and MS SQL Server? And if not, how does it work for them?

THE END.
Thailand, Chon Buri, South Pattaya.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *