Based on an article by Craig Freedman: Introduction to Parallel Query Execution
SQL Server is able to execute queries simultaneously on multiple processors. This feature is called parallel query execution. Parallel query execution can be used to reduce the response time (that is, improve performance) of large queries. It can also be used when executing large queries (which process a large amount of data) at the same time as small queries (scaling), increasing the number of processors used in servicing the query. For most large queries, SQL Server scales almost linearly or near linearly. The increase in performance here means that if we double the number of processors, we can see a reduction in response time also by half. Scaling here means that if we double the number of processors and the size of the request, we get the same response time.
When is parallelism useful?
As I noted above, parallelism can be used to reduce the response time of a single request. However, parallelism affects the cost: it increases due to the increase in the overhead of executing the request. Although this overhead is small, the parallelism is not desirable for small queries (especially OLTP queries), for which such an “add-on” would be commensurate with the total execution time. In addition, if we compare the execution time of the same small query in parallel execution mode on two processors or sequential execution (without parallelism, on one processor), such queries will typically take longer execution of a parallelized query, moreover, almost twice longer. Again, this is due to the relatively high concurrency overhead for small queries.
First of all, parallelism is useful for those servers that execute a relatively small number of concurrent requests. For such servers, parallelism can allow a small number of requests to consume a large number of processors. For servers with a large number of parallel requests (such as OLTP), the need for parallelism is less because all the processors will be utilized anyway; simply because the system has enough requests for this. Parallelizing these queries would only add additional overhead, which would degrade the overall performance of the system.
How does SQL Server parallelize queries?
SQL Server parallelizes queries by partitioning the input data horizontally and sets the number of roughly equal partitions to the number of processors available, after which the same operation is performed on each partition (eg, aggregation, join, etc.). Let’s say we want to use two processors for a query with a hash aggregate that we want to group by an integer column. We create two threads (one for each processor). Each thread executes the same hash aggregate statement. The division of the input data can be organized, for example, based on the even or odd values of the grouping column. All rows belonging to the same group and allocated to one thread are processed by one hash aggregation operator, and as a result we get the correct result.
This parallel query execution method is both simple and highly scalable. In the example shown above, both hash aggregate threads run independently. The two threads do not need to exchange messages or coordinate their work. If we need to increase the degree of parallelism (DOP), we can simply add more threads and adjust the partitioning function accordingly (in practice, a hash function is used to partition rows for a hash aggregate, which is suitable for any data type, any number of column groups, and any number of threads ).
The real partitioning and movement of data between threads is performed by a parallelism iterator (or else they say – an exchange iterator). Although this iterator is very specific, it supports the same interfaces as any other iterator. Most iterators don’t need to know that they are executing in parallel. Just the appropriate concurrency iterators are added to the plan and execution proceeds in parallel.
Please note that the method of parallelism described above is not the same as “pipelined” parallelism, when several unrelated statements are executed simultaneously in different threads. Although SQL Server often places different statements on different threads, the primary reason is to be able to allocate data sections again as they are passed from one statement to the next. With a pipeline, the degree of parallelism and the total number of threads would be limited by the number of statements.
Who decides whether to parallelize a query?
The query optimizer decides whether it is necessary to parallelize query execution. This decision, like most others, is based on cost. A complex and expensive query that processes many rows is more likely to be parallelized than a simple query that processes very few rows.
Who determines the degree of parallelism (DOP)?
DOP is not part of the cached compiled query execution plan and may change on the next execution. DOP is determined at the start of execution, taking into account the number of processors on the server and the “max degree of parallelism” and “max worker threads” global configuration parameters set via sp_configure (they are only visible if set to “show advanced options”), and taking into account the hint to the optimizer MAXDOP if used. In short, the DOP is chosen to obtain concurrency and not run out of worker threads. If MAXDOP 1 is specified, all concurrency iterators will be removed from the plan and the query will be executed against the sequential plan on a single thread.
Note that the number of threads used by a parallel query may
exceed DOP. If, when executing a parallel query, to track the state
sys.sysprocesses, you can see more threads than DOP. As I said
above, if you again allocate data sections between two statements, they will be placed in different streams. DOP specifies the number of threads per statement, not the total number of threads per query execution plan. In SQL Server 2000, if DOP was less than
number of processors, additional threads could use the remaining
processors, which could actually lead to deviations from the specified settings
MAXDOP. In SQL Server 2005, when a query with a given DOP is executed, also
the number of schedulers is also limited. That is, all streams used
request will be assigned to the same scheduler set, and the request will
use only the specified DOP number of processors, regardless of the total number