Multiple UNPIVOTs versus CROSS APPLY analyzed

Few days ago I posted a bookmark to a blog post where an example of multiple UNPIVOTs was given. I stated that such scenario – UNPIVOT of multiple groups which contain related data – is not optimal and CROSS APPLY should be preferred.

To demonstrate this, let’s use the example mentioned in the original post and compare execution plans and execution statistics. First data sample will contain three rows, second – 96 rows and third one – 3072. I am testing it on SQL Server 2012 SP1 in a 4-core VM at home lab.

The queries to be compared are:

SELECT Id,
ROW_NUMBER()OVER(Order By ID) as NewID,
Employee,
Company,
City
FROM
( SELECT Id, Employee, Company1, Company2, Company3, City1, City2, City3  FROM EmploymentHistoryWithCity ) Main
UNPIVOT ( Company FOR companies IN (Company1, Company2, Company3) ) Sup
UNPIVOT ( City For Cities IN (City1, City2, City3 ) ) Ct
WHERE RIGHT(companies,1) = RIGHT(Cities,1)
SELECT Id,
ROW_NUMBER()OVER(Order By ID) as NewID,
Employee,
Company,
City
from  EmploymentHistoryWithCity
CROSS APPLY (VALUES([Company1], [City1]), ([Company2], [City2]), ([Company3], [City3])) u(Company, City)

With just three records in source table the execution statistics are just about the same, with difference being just about the error margin. Looking at the execution plans we spot a first difference (top – original UNPIVOT, bottom – CROSS APPLY):

cross apply vs unpivot_3 rows

Since we use UNPIVOT twice, a join must be performed twice using Nested Loops operator whereas CROSS APPLY requires only one join. Additionally filtering must be done in UNPIVOT case to eliminate extra rows from being generated and it might have an impact on performance as we’re using RIGHT(…) = RIGHT(…) condition which will have to be evaluated row-by-row. Let’s add more rows to the table to see how it affects the query execution.

Using following query will create 93 more rows:

INSERT INTO EmploymentHistoryWithCity
SELECT
[Employee]
,[Company1]
,[Company2]
,[Company3]
,[City1]
,[City2]
,[City3]
FROM [dbo].[EmploymentHistoryWithCity]
GO 5

The execution takes a little bit longer but both queries are still on par with each other:

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(288 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 106 ms.

(288 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 107 ms.

The execution plans start to show a little difference:

cross apply vs unpivot_96 rows

The second join in UNPIVOT query costs about 15% of total query cost making for the difference between queries. Also second filter comes into play – with 96 rows first join puts 288 rows in the pipeline, but second join multiplies it by three again and makes it 864 rows that need to be scanned. Again, let’s add more rows and see what happens – after using the query above row count goes from 96 to 3072.

Statistics first:

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(9216 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 339 ms.

(9216 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 159 ms.

And the execution plan:

cross apply vs unpivot_3072 rows

The difference is now clearly visible. The second Nested Loops operator has dominated the query – it has to be executed for every row returned from first UNPIVOT. We might help the optimizer a bit at this point and create a clustered index on primary key on ID column and then try to add indexes on Company* and City* fields to try to help the UNPIVOT operator.

The creation of clustered index makes the difference even more visible. The statistics show that in this case UNPIVOT requires 303 ms, while CROSS APPLY will make it in 199 ms. But the execution plan shows 75% – 25%. That’s because data extraction from the table cost less, but the whole join operation will be unaltered. This way we only made cheaper the part of the query that was already quick:

cross apply vs unpivot_3072 rows_clustered index

Adding nonclustered indexes on Company and City fields does not help either. This is because both join will have to be done in memory and they have to be nested loops to secure UNPIVOT functionality – you have to scan the values in order to be able to output the correct result.

Summarizing – if you have groups of columns that need to be UNPIVOTed together (being connected with a sort of join dependency) use CROSS APPLY with VALUES to do it. The other question is a design of such table – this is not in line with fifth normal form and should be normalized into two tables (if required). This will allow for more optimal resource usage.

How to UNPIVOT table in SQL Server / UNPIVOT table example in SQL Server

Recently I’ve had a similar case when I needed to unpivot multiple related columns at the same time. It turned out that running multiple UNPIVOTs is actually slower and creates more complex execution plans than a single CROSS APPLY. The difference is getting more significant as row count grows.

Since I was unable to correctly format the post to display SQL queries inside, the details are available in another post.

Tuvian : Discussion forum for ASP.Net,C#,Ajax,JQuery,JavaScript,HTML,CSS..

What is PIVOT and UNPIVOT operator in SQL Server 

We can use the PIVOT and UNPIVOT relational operators to change a table-valued expression into another table. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

Please go through our previous post for PIVOT sample in SQL

Before UNPIVOT

After UNPIVOT

A simple UNPIVOT sample in SQL Server

Here we are going to demonstrate a very simple UNPIVOT sample without any complexity. We are having a table named EmploymentHistory and its containing employee name with previous company name that particular employee worked.

Table Structure For showing simple UNPIVOT sample

CREATE TABLE EmploymentHistory (Id INT, Employee VARCHAR(500), Company1 VARCHAR(500), Company2 VARCHAR(500), Company3 VARCHAR(500) ) GO -- Load Sample data INSERT…

View original post 282 more words

How to get CLR file contents from database (and convert from varbinary to string)

Imagine following situation – you have a procedure that uses CLR function to perform some data manipulation. You suspect the CLR function to be sub-optimal, but don’t have the source and cannot confirm your claim. If you are lucky, you can get its contents by scripting whole assembly – if it’s in a DLL, you might try to reverse-engineer it using .NET Reflector (if you have a DLL file). If it’s a separate file (in C# for example), it is scripted in binary format and you have to unscramble it. This is my method of doing it:

;WITH s AS (SELECT 'here comes binary representation of file' s),
num AS (
SELECT v1.number + v2.number * 100 + v3.number * 10000 + v4.number * 1000000 number from master.dbo.spt_values v1 CROSS JOIN master.dbo.spt_values v2 CROSS JOIN master.dbo.spt_values v3 CROSS JOIN master.dbo.spt_values v4
WHERE v1.number < 100 AND v1.type = 'P' AND
v2.number < 100 AND v2.type = 'P' AND
v3.number < 100 AND v3.type = 'P' AND
v4.number < 100 AND v4.type = 'P' ),
ch AS (
SELECT CHAR(CONVERT(int, CONVERT(VARBINARY, '0x' + SUBSTRING(s, 3, 2), 1))) ch, 0 AS i FROM s
UNION ALL
SELECT CHAR(CONVERT(int, CONVERT(VARBINARY, '0x' + SUBSTRING(s, 3 + 2*(i+1), 2), 1))), i + 1 from s INNER JOIN ch ON 1 = 1 WHERE 2 + 2*(i+1) < LEN(s))
SELECT REPLACE(REPLACE(REPLACE(ch,'</ch><ch>', ''), '</ch>', ''),'<ch>', '') from ch FOR XML PATH('')
OPTION (MAXRECURSION 0)

You may also try this when facing a task of conversion from varbinary to string. Example – binary string is 0xEFBBBF7573696E672053797374656D3B0D0A7573696E672053797374656D2E446174613B0D0A7573696E672053797374656D2E446174612E53716C436C69656E743B0D0A7573696E672053797374656D2E446174612E53716C54797065733B0D0A7573696E67204D6963726F736F66742E53716C5365727665722E5365727665723B0D (a snippet of actual C# code), so running this query

;WITH s AS (SELECT '0xEFBBBF7573696E672053797374656D3B0D0A7573696E672053797374656D2E446174613B0D0A7573696E672053797374656D2E446174612E53716C436C69656E743B0D0A7573696E672053797374656D2E446174612E53716C54797065733B0D0A7573696E67204D6963726F736F66742E53716C5365727665722E5365727665723B0D' s),
num AS (
SELECT v1.number + v2.number * 100 + v3.number * 10000 + v4.number * 1000000 number from master.dbo.spt_values v1 CROSS JOIN master.dbo.spt_values v2 CROSS JOIN master.dbo.spt_values v3 CROSS JOIN master.dbo.spt_values v4
WHERE v1.number < 100 AND v1.type = 'P' AND
v2.number < 100 AND v2.type = 'P' AND
v3.number < 100 AND v3.type = 'P' AND
v4.number < 100 AND v4.type = 'P' ),
ch AS (
SELECT CHAR(CONVERT(int, CONVERT(VARBINARY, '0x' + SUBSTRING(s, 3, 2), 1))) ch, 0 AS i FROM s
UNION ALL
SELECT CHAR(CONVERT(int, CONVERT(VARBINARY, '0x' + SUBSTRING(s, 3 + 2*(i+1), 2), 1))), i + 1 from s INNER JOIN ch ON 1 = 1 WHERE 2 + 2*(i+1) < LEN(s))
SELECT REPLACE(REPLACE(REPLACE(ch,'</ch><ch>', ''), '</ch>', ''),'<ch>', '') from ch FOR XML PATH('')
OPTION (MAXRECURSION 0)

will give

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

If you have any questions/suggestions, please let me know.

EDIT:
It’s worth to read Books Online. On the CAST and CONVERT reference page, example H, there is exactly the same example of conversion from binary to string and they do it ine one line:

SELECT convert(varchar(max), binary_string, 0)

Thank you and good night. 🙂

SQL Server Hardware Choices Made Easy by Glenn Berry – a quick review

I have come across Glenn Berry’s e-book by accident on twitter while ago by accident, downloaded it and then liked it very much. I have read it over one evening and first thing I did next day after arriving to work was forwarding it to all my coworkers.

What was the most interesting for me (as I share Glenn’s interest in hardware) was the detailed description of several interesting configurations using current hardware, as well as some future trends (like enterprise SSD drives, AMD Bulldozer or new Intel CPU). It just picks up where most of the books and guides leave you – for instance when most authors tell you to set up RAID-1 for logs and RAID-10 for database files, Glenn gives you an overview what actual models and what exact configuration may be applied and what is the cost of it. Furthermore, he shows what CPUs and in which configuration are preferred depending on your worklad (OLTP differs from OLAP). You will see that RAM is cheap and you don’t have to save on it. Last but not least, you will get a pocket-friendly suggestion on how to build a workstation for SQL Server.

If you have some second thoughts while picking up hardware for your new SQL Server, you have always bought 1-2-3-style machines (CPU/RAM/HDD count) or you’re simply interested in computer hardware and databases, this book is for you. Trust me, you won’t be disappointed.

[Story] Terrible performance problem

Hello again and welcome to new series categorized informally as “DBA stories”. This is what it means – stories from real life depicting strange, bizarre, sometimes funny (sometimes not) but somehow interesting to a DBA. The choice of them is arbitrary – as long as I have my stories, I will try not to include stories posted by someone else. And remember, these things happened, so they might as well happen to you!

OK, so for the beginning a situation that made me stay up whole night once. I get a phone call from help desk agent around 11pm saying that one of mission-critical systems suffers from a poor performance. OK, I say, let me see – I log on and there it is, CPU load 100% with SQL Server barely responding due to load. The system uses transactional replication to maintain a warm-standby and I started thinking of invoking business continuity procedures. Then I saw that a 10 GB database has 40 GB of log which is growing by 1 GB every half an hour. So, it’s a database problem and I start digging.

Service restart for SQL Server didn’t help, of course – it didn’t remove the source of the problem. So I started looking for the source with Profiler and then I found it.

The clue was – one of the components received a patch with a new functionality, which – from my (DBA) view – took all records in one of the tables and in a loop updated all records in each iteration (for each record there was UPDATE issued without WHERE). There were like 5000 records in this table, so each execution of this procedure updated ~25 million records, and that’s not the end of it. This procedure was run once a minute! That’s why the log was growing so fast and replication could not simply keep it up as well.

Once the problem was found, one quick call to application support team with request to disable this functionality solved it immediately. How did it pass QA? you might ask. Well, there are two morals:

  1. Keep your QA environment as similar to production as possible.
  2. Don’t trust your developers – have it checked two or three times.