Multiple UNPIVOTs versus CROSS APPLY analyzed

Few days ago I posted a bookmark to a blog post where an example of multiple UNPIVOTs was given. I stated that such scenario – UNPIVOT of multiple groups which contain related data – is not optimal and CROSS APPLY should be preferred.

To demonstrate this, let’s use the example mentioned in the original post and compare execution plans and execution statistics. First data sample will contain three rows, second – 96 rows and third one – 3072. I am testing it on SQL Server 2012 SP1 in a 4-core VM at home lab.

The queries to be compared are:

SELECT Id,
ROW_NUMBER()OVER(Order By ID) as NewID,
Employee,
Company,
City
FROM
( SELECT Id, Employee, Company1, Company2, Company3, City1, City2, City3  FROM EmploymentHistoryWithCity ) Main
UNPIVOT ( Company FOR companies IN (Company1, Company2, Company3) ) Sup
UNPIVOT ( City For Cities IN (City1, City2, City3 ) ) Ct
WHERE RIGHT(companies,1) = RIGHT(Cities,1)
SELECT Id,
ROW_NUMBER()OVER(Order By ID) as NewID,
Employee,
Company,
City
from  EmploymentHistoryWithCity
CROSS APPLY (VALUES([Company1], [City1]), ([Company2], [City2]), ([Company3], [City3])) u(Company, City)

With just three records in source table the execution statistics are just about the same, with difference being just about the error margin. Looking at the execution plans we spot a first difference (top – original UNPIVOT, bottom – CROSS APPLY):

cross apply vs unpivot_3 rows

Since we use UNPIVOT twice, a join must be performed twice using Nested Loops operator whereas CROSS APPLY requires only one join. Additionally filtering must be done in UNPIVOT case to eliminate extra rows from being generated and it might have an impact on performance as we’re using RIGHT(…) = RIGHT(…) condition which will have to be evaluated row-by-row. Let’s add more rows to the table to see how it affects the query execution.

Using following query will create 93 more rows:

INSERT INTO EmploymentHistoryWithCity
SELECT
[Employee]
,[Company1]
,[Company2]
,[Company3]
,[City1]
,[City2]
,[City3]
FROM [dbo].[EmploymentHistoryWithCity]
GO 5

The execution takes a little bit longer but both queries are still on par with each other:

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(288 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 106 ms.

(288 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 107 ms.

The execution plans start to show a little difference:

cross apply vs unpivot_96 rows

The second join in UNPIVOT query costs about 15% of total query cost making for the difference between queries. Also second filter comes into play – with 96 rows first join puts 288 rows in the pipeline, but second join multiplies it by three again and makes it 864 rows that need to be scanned. Again, let’s add more rows and see what happens – after using the query above row count goes from 96 to 3072.

Statistics first:

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(9216 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 339 ms.

(9216 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 159 ms.

And the execution plan:

cross apply vs unpivot_3072 rows

The difference is now clearly visible. The second Nested Loops operator has dominated the query – it has to be executed for every row returned from first UNPIVOT. We might help the optimizer a bit at this point and create a clustered index on primary key on ID column and then try to add indexes on Company* and City* fields to try to help the UNPIVOT operator.

The creation of clustered index makes the difference even more visible. The statistics show that in this case UNPIVOT requires 303 ms, while CROSS APPLY will make it in 199 ms. But the execution plan shows 75% – 25%. That’s because data extraction from the table cost less, but the whole join operation will be unaltered. This way we only made cheaper the part of the query that was already quick:

cross apply vs unpivot_3072 rows_clustered index

Adding nonclustered indexes on Company and City fields does not help either. This is because both join will have to be done in memory and they have to be nested loops to secure UNPIVOT functionality – you have to scan the values in order to be able to output the correct result.

Summarizing – if you have groups of columns that need to be UNPIVOTed together (being connected with a sort of join dependency) use CROSS APPLY with VALUES to do it. The other question is a design of such table – this is not in line with fifth normal form and should be normalized into two tables (if required). This will allow for more optimal resource usage.

Advertisements

One thought on “Multiple UNPIVOTs versus CROSS APPLY analyzed

  1. Pingback: How to UNPIVOT table in SQL Server / UNPIVOT table example in SQL Server « Yet another SQL Server DBA…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s