Subqueries in a CASE expression

Based on an article by Craig Freedman: Subqueries in CASE Expressions

This article will look at how SQL Server handles subqueries in a CASE expression. In addition, several exotic connection possibilities will be considered.

Scalar expressions

For simple use cases of CASE expressions (no subqueries), it is possible to evaluate only a CASE expression consisting of several scalar expressions:

create table   T1 (a int, b int,   c int)
select
    case
          when T1.a > 0   then
              T1.b
          else
              T1.c
    end
from T1

|–Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [T1].[a]>(0) THEN [T1].[b]

ELSE [T1].[c] END))

|–Table Scan(OBJECT:([T1]))

This query plan involves looking at table T1 and evaluating the CASE expression for each of its rows. The Compute Scalar statement evaluates the value of a CASE expression, including evaluating the condition and deciding whether the evaluation is performed in the THEN or ELSE clause.
If you put subqueries in the CASE expression, things get a little more complicated and much more interesting.

WHEN clause

Let’s first add a simple subquery to the WHEN clause:

create table   T2 (a int, b int)
select
    case
          when exists   (select * from   T2 where T2.a = T1.a) then
              T1.b
          else
              T1.c
    end
from T1

|–Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1010] THEN [T1].[b] ELSE [T1].[c] END))

|–Nested Loops(Left Semi Join, OUTER REFERENCES:([T1].[a]), DEFINE:([Expr1010] = [PROBE VALUE]))

|–Table Scan(OBJECT:([T1]))

|–Table Scan(OBJECT:([T2]), WHERE:([T2].[a]=[T1].[a]))

As with the other EXISTS subqueries, this plan uses a left semi-join to check if, for each row in T1, there is a corresponding row in T2. However, a normal semi-join (or anti-semi-join) returns only paired rows (or unpaired ones). In this case, at least something (T1.b or T1.c) must be returned for each row in T1. We cannot simply discard row T1 just because there is no corresponding row in T2 for it.
The solution was a special type of semi-join with a trial table column. This semi-join returns all outer matching or non-matching rows, and sets the trial table column (in our case this is [Expr1010]) to true or false, indicating whether the matching string T1 was found. After that, the CASE expression is evaluated using a sample table column to determine which value will be returned.

THEN offer

Let’s now try adding a simple subquery to the THEN clause:

create table   T3 (a int unique   clustered, b int)
insert T1 values(0, 0,   0)
insert T1 values(1, 1,   1)

select
    case
          when T1.a > 0   then
              (select T3.b from   T3 where T3.a = T1.b)
          else
              T1.c
    end
from T1

I added a unique constraint to T3 to ensure that the scalar subquery returns only one row. Without limitation, the query plan would be more complex because the optimizer would have to guarantee that the subquery would indeed only return one row, and would have to throw an error if more than one row was returned.
I also added two more lines to T1, moreover, the condition in the WHEN clause will return false for the first line and true for the second line. Thus, the first line will be suitable for ELSE, and the second for THEN. Note that the subquery value in THEN will only be used if the WHEN clause is true.
The statistics profile for this query’s execution plan is shown below:

RowsExecutes

0 0 |–Compute Scalar(DEFINE:([Expr1008]=CASE WHEN [T1].[a]>(0)

THEN [T3].[b]

ELSE [T1].[c] END))

2 1 |–Nested Loops(Left Outer Join, PASSTHRU:

(IsFalseOrNull [T1].[a]>(0)),

OUTER REFERENCES:([T1].[b]))

2 1 |–Table Scan(OBJECT:([T1]))

0 1 |–Clustered Index Seek(OBJECT:([T3].[UQ__T3__412EB0B6]),

SEEK:([T3].[a]=[T1].[b])

ORDERED FORWARD)

This query plan uses a special type of nested loop join that uses the PASSTHRU predicate. The join evaluates the PASSTHRU predicate for each outer row. If the PASSTHRU predicate evaluates to true, the join immediately returns a string like a semi-join or outer join. If the PASSTHRU predicate evaluates to false, the join is performed in the usual way, i.e. an attempt is made to join the outer string with the inner string.
In the example above, the PASSTHRU predicate of the CASE expression is the inverse (notice the IsFalseOrNull function) of the WHEN clause. If the WHEN clause evaluates to true, the PASSTHRU predicate evaluates to false, a join occurs, and the search on the inside of the join evaluates the THEN subquery. If the WHEN clause evaluates to false, the PASSTHRU predicates evaluate to true, the join is skipped, and the search or THEN subquery is not executed.
Note that T1’s lookup returns 2 rows, even though T3’s lookup is only done once. This is because in our example, the WHEN clause is true for only one of the two rows. The PASSTHRU predicate is the only mechanism when the number of rows on the outside of a nested loop join does not exactly match the number of rows on the inside.
Also note that once an outer join is used, there is no guarantee that the subquery in THEN will return anything (in fact, it is only guaranteed that no more than one row will be returned due to the unique constraint). If the subquery returns no rows, the outer join will simply return NULL for T3.b. If an inner join were to be used, it would be wrong to discard the T1 row. Caveat: I ran these examples on SQL Server 2005. If you run this example on SQL Server 2000, the PASSTHRU predicate will be visible, but it will appear as a regular WHERE clause predicate in the query execution plan. Unfortunately, there is no easy way for SQL Server 2000 to distinguish between regular predicates and the PASSTHRU predicate.

ELSE clause and multiple WHEN clauses

A subquery in an ELSE clause works exactly like a subquery in a THEN clause. The PASSTHRU predicate will be used to evaluate the subquery condition.
Similarly, a CASE statement with multiple WHEN clauses, with subqueries in each THEN clause, will work similarly. The only difference is that there will be more PASSTHRU predicates.
For example:

create table   T4 (a int unique   clustered, b int)
create table   T5 (a int unique   clustered, b int)

select
    case
          when T1.a > 0   then
              (select T3.b from   T3 where T3.a = T1.a)
          when T1.b > 0   then
              (select T4.b from   T4 where T4.a = T1.b)
          else
              (select T5.b from T5   where T5.a = T1.c)
    end
from T1

|–Compute Scalar(DEFINE:([Expr1016]=CASE WHEN [T1].[a]>(0)

THEN [T3].[b]

ELSE CASE WHEN [T1].[b]>(0)

THEN [T4].[b]

ELSE [T5].[b]

END

END))

|–Nested Loops(Left Outer Join, PASSTHRU:([T1].[a]>(0)OR [T1].[b]>(0)),

OUTER REFERENCES:([T1].[c]))

|–Nested Loops(Left Outer Join, PASSTHRU:([T1].[a]>(0)

OR IsFalseOrNull [T1].[b]>(0)),

OUTER REFERENCES:([T1].[b]))

| |–Nested Loops(Left Outer Join, PASSTHRU:

(IsFalseOrNull [T1].[a]>(0)),

OUTER REFERENCES:([T1].[a]))

| | |–Table Scan(OBJECT:([T1]))

| | |–Clustered Index Seek(OBJECT:([T3].[UQ__T3__164452B1]),

SEEK:([T3].[a]=[T1].[a])

ORDERED FORWARD)

| |–Clustered Index Seek(OBJECT:([T4].[UQ__T4__182C9B23]),

SEEK:([T4].[a]=[T1].[b])

ORDERED FORWARD)

|–Clustered Index Seek(OBJECT:([T5].[UQ__T5__1A14E395]),

SEEK:([T5].[a]=[T1].[c])

ORDERED FORWARD)

This query plan has three nested loop joins with PASSTHRU predicates. For each T1 row, only one of the three PASSTHRU predicates evaluates to true, and only one of the three subqueries will be executed. Note that while the second WHEN clause matches “T1.b > 0”, this means that the first WHEN clause where “T1.a > 0” is false. This also applies to the ELSE clause. Thus, the PASSTHRU predicates for the second and third subqueries include the check “T1.a > 0 OR…”.

Sample table column as PASSTHRU predicate

Finally, let’s look at a query with subqueries in WHEN clauses and in THEN clauses. Also, for a change, let’s move the CASE expression from the SELECT list to the WHERE clause.

select *
from T1
where 0   =
    case
          when exists   (select * from   T2 where T2.a = T1.a) then
              (select T3.b from   T3 where T3.a = T1.b)
          else
              T1.c
    end

|–Filter(WHERE:((0)=CASE WHEN [Expr1013]

THEN [T3].[b]

ELSE [T1].[c]

END))

|–Nested Loops(Left Outer Join, PASSTHRU:(IsFalseOrNull [Expr1013]),

OUTER REFERENCES:([T1].[b]))

|–Nested Loops(Left Semi Join, OUTER REFERENCES:([T1].[a]),

DEFINE:([Expr1013] = [PROBE VALUE]))

| |–Table Scan(OBJECT:([T1]))

| |–Table Scan(OBJECT:([T2]), WHERE:([T2].[a]=[T1].[a]))

|–Clustered Index Seek(OBJECT:([T3].[UQ__T3__164452B1]),

SEEK:([T3].[a]=[T1].[b])

ORDERED FORWARD)

This query execution plan has a left semi-join with the sample table column to evaluate the subquery in the WHEN clause, and a nested loops join with the PASSTHRU predicate on the sample table column to decide whether to evaluate the subquery in the THEN clause. Because the CASE expression has been moved to the WHERE clause, the Filter statement is used to evaluate the output values ​​from the SELECT list instead of the Compute Scalar statement, which determines which rows will be returned. Everything else works exactly the same.

Further…

In the next article, I’ll look at several other types of subqueries.

Similar Posts

Leave a Reply Cancel reply