Multiple UNPIVOTs versus CROSS APPLY analyzed

Few days ago I posted a bookmark to a blog post where an example of multiple UNPIVOTs was given. I stated that such scenario – UNPIVOT of multiple groups which contain related data – is not optimal and CROSS APPLY should be preferred.

To demonstrate this, let’s use the example mentioned in the original post and compare execution plans and execution statistics. First data sample will contain three rows, second – 96 rows and third one – 3072. I am testing it on SQL Server 2012 SP1 in a 4-core VM at home lab.

The queries to be compared are:

SELECT Id,
ROW_NUMBER()OVER(Order By ID) as NewID,
Employee,
Company,
City
FROM
( SELECT Id, Employee, Company1, Company2, Company3, City1, City2, City3  FROM EmploymentHistoryWithCity ) Main
UNPIVOT ( Company FOR companies IN (Company1, Company2, Company3) ) Sup
UNPIVOT ( City For Cities IN (City1, City2, City3 ) ) Ct
WHERE RIGHT(companies,1) = RIGHT(Cities,1)
SELECT Id,
ROW_NUMBER()OVER(Order By ID) as NewID,
Employee,
Company,
City
from  EmploymentHistoryWithCity
CROSS APPLY (VALUES([Company1], [City1]), ([Company2], [City2]), ([Company3], [City3])) u(Company, City)

With just three records in source table the execution statistics are just about the same, with difference being just about the error margin. Looking at the execution plans we spot a first difference (top – original UNPIVOT, bottom – CROSS APPLY):

cross apply vs unpivot_3 rows

Since we use UNPIVOT twice, a join must be performed twice using Nested Loops operator whereas CROSS APPLY requires only one join. Additionally filtering must be done in UNPIVOT case to eliminate extra rows from being generated and it might have an impact on performance as we’re using RIGHT(…) = RIGHT(…) condition which will have to be evaluated row-by-row. Let’s add more rows to the table to see how it affects the query execution.

Using following query will create 93 more rows:

INSERT INTO EmploymentHistoryWithCity
SELECT
[Employee]
,[Company1]
,[Company2]
,[Company3]
,[City1]
,[City2]
,[City3]
FROM [dbo].[EmploymentHistoryWithCity]
GO 5

The execution takes a little bit longer but both queries are still on par with each other:

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(288 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 106 ms.

(288 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 107 ms.

The execution plans start to show a little difference:

cross apply vs unpivot_96 rows

The second join in UNPIVOT query costs about 15% of total query cost making for the difference between queries. Also second filter comes into play – with 96 rows first join puts 288 rows in the pipeline, but second join multiplies it by three again and makes it 864 rows that need to be scanned. Again, let’s add more rows and see what happens – after using the query above row count goes from 96 to 3072.

Statistics first:

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(9216 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 339 ms.

(9216 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 159 ms.

And the execution plan:

cross apply vs unpivot_3072 rows

The difference is now clearly visible. The second Nested Loops operator has dominated the query – it has to be executed for every row returned from first UNPIVOT. We might help the optimizer a bit at this point and create a clustered index on primary key on ID column and then try to add indexes on Company* and City* fields to try to help the UNPIVOT operator.

The creation of clustered index makes the difference even more visible. The statistics show that in this case UNPIVOT requires 303 ms, while CROSS APPLY will make it in 199 ms. But the execution plan shows 75% – 25%. That’s because data extraction from the table cost less, but the whole join operation will be unaltered. This way we only made cheaper the part of the query that was already quick:

cross apply vs unpivot_3072 rows_clustered index

Adding nonclustered indexes on Company and City fields does not help either. This is because both join will have to be done in memory and they have to be nested loops to secure UNPIVOT functionality – you have to scan the values in order to be able to output the correct result.

Summarizing – if you have groups of columns that need to be UNPIVOTed together (being connected with a sort of join dependency) use CROSS APPLY with VALUES to do it. The other question is a design of such table – this is not in line with fifth normal form and should be normalized into two tables (if required). This will allow for more optimal resource usage.

Advertisements

Monitoring native Windows deduplication job progress with Power Shell

I am a big fan of Windows data deduplication since I first saw it in Windows Server 2012 RC. You can never have enough of the disk space, right? 馃檪 It has its downsides (mostly maintenance-related – I’ll describe it later) but the benefit is amazing – official Microsoft blog claims they were able to squeeze 6TB of data in 1.6 TB. Another cool thing about data deduplication is that you can even have it running on Windows 8 (see here for details) by importing a package and then enabling a new Windows feature.

The deduplication itself is controlled through a number of Power Shell cmdlets, but the thing is that you can’t really see the progress of it. To help with it, I came up with an idea of running Get-DedupJob in a loop and showing the output on a progress bar. Here’s the script:

for ($j = 0 ; $j -le 10000; $j++) 
{ 
  Get-DedupJob | %{
    Write-Progress 
      -Id $_.ProcessId 
      -Activity ($_.Volume + " - " + $_.Type) 
      -Status ($_.State.ToString() +  " " + $_.Progress.ToString() + "% percent complete") 
      -PercentComplete $_.Progress}; 

    Start-Sleep 2 | out-null;
}

You might be also tempted to check out what the disk savings are – you can use similar loop concept to do it:

for ($j = 0 ; $j -le 10000; $j++) 
{ 
  Get-DedupStatus;
  Start-Sleep 2 | out-null;
}

I am using deduplication to control the size of my VM library at home lab and it works like charm – allows me to keep 400+GB of VHDs in less than 150 GB. However you need to remember about certain things (read the docs for details):

  • There are three kind of jobs – optimization, scrubbing and garbage collection. You have to ensure that all of them are running – for example a simple file delete command does not instantly reclaim unused space. This is especially important in a lab that’s not running 24×7. You have to consider adjust deduplication schedule to suit your needs
  • Enabling background optimization might have an impact on longevity of SSD drives due to additional I/O operations.
  • You need to have Windows 8/Server 2012 or newer to suport deduplication-enabled volumes. Windows 7 or older OSes cannot access it.
  • Last but not least, you can’t enable deduplication on a system drive.

How to UNPIVOT table in SQL Server / UNPIVOT table example in SQL Server

Recently I’ve had a similar case when I needed to unpivot multiple related columns at the same time. It turned out that running multiple UNPIVOTs is actually slower and creates more complex execution plans than a single CROSS APPLY. The difference is getting more significant as row count grows.

Since I was unable to correctly format the post to display SQL queries inside, the details are available in another post.

Tuvian : Discussion forum for ASP.Net,C#,Ajax,JQuery,JavaScript,HTML,CSS..

What is PIVOT and UNPIVOT operator in SQL Server聽

We can use the PIVOT and UNPIVOT relational operators to change a table-valued expression into another table. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output.

Please go through our previous post for PIVOT sample in SQL

Before UNPIVOT

After UNPIVOT

A simple UNPIVOT sample in SQL Server

Here we are going to demonstrate a very simple UNPIVOT sample without any complexity. We are having a table named EmploymentHistory and its containing employee name with previous company name that particular employee worked.

Table Structure For showing simple UNPIVOT sample

CREATE TABLE EmploymentHistory (Id INT, Employee VARCHAR(500), Company1 VARCHAR(500), Company2 VARCHAR(500), Company3 VARCHAR(500) ) GO -- Load Sample data INSERT鈥

View original post 282 more words

How to get all queries used in SSRS

Since I’m doing a massive code review resulting in a lot of query tuning recently, I needed to review all the reports on one of SSRS instances. I didn’t want to go through reports one by one, but dump all the queries and then work with them. Here’s the query I used:

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition', 'http://schemas.microsoft.com/SQLServer/reporting/reportdesigner' AS rd)
SELECT a.Name, CAST(a2.q.query('./text()') AS varchar(MAX)) Query FROM
(
SELECT Name,
CAST(CAST(Content AS VARBINARY(MAX)) AS XML) AS ReportXML
FROM ReportServer.dbo.Catalog
where Type = 2) a
CROSS APPLY ReportXML.nodes('/Report/DataSets/DataSet') a1(q)
CROSS APPLY a1.q.nodes('./Query/CommandText') a2(q)

It returns a list of reports with all queries used by them – just run it in your SSRS ReportServer database. Works for 2008 – 2012, can’t check it with 2005 but since the XSD of report definition does not change much from 2005 to 2008 it should work as well.

I’m an MCSA: SQL Server 2012, it’s official! :)

As of April 29, 2013 I became Microsoft Certified SQL Server (MCSA): SQL Server 2012. I passed last of three exams (70-463) required for this title.
It was a difficult one – especially when I had not much SSIS experience in last 18 months other than creation of maintenance plans. I had to spent a few weekends with Evaluation Edition exercising in order to feel quite comfortable with my knowledge.

For all of you preparing to this exam, I recommend going through all SSIS components mentioned and exercising various concepts mentioned in course curriculum. The theoretical basis is also important but the main focus is on practical aspects of development in SSIS, so do your due dilligence.

I still plan to complete remaining two exams required to become MCSE: Data Platform this year, so hold on, I will keep posting about that.

Merry Christmas to data professionals

Let me use this opportunity to wish all the best for all of you database professionals out there. Here is what you might want to find under the Christmas B-tree:

DBAs:
Let your backups run smooth and never get broken;
May your recovery never surprise you;
May the drives never break and SAN be always available;
Have licensing be no worry for you;
Let indexes never fragment and their maintenance be transparent to users.

Devs:
May your queries use the best indexes and never cause excessive locks;
May the data be normalized when possible, properly sized and typed;
Let the keys and constraints describe what needed;
May you use right tools for the job.

All the best in the coming 2013 to everyone!

What I worked with in my life (in terms of computers)

OK, so this is mostly off-topic as the main course is concerned. Yet I thought I might share the list of stuff I worked with over the years.

I am in my thirties now and my first contact with computers was in late 80s, when I played the unforgettable “River Raid” on Atari 800 XL thanks to courtesy of my uncle. This Atari became later mine and I spent time first playing with it, but thanks to it my first non-gaming computer endeavours were about 20 years ago. I was 11 by that time and I first discovered Basic (which sucked on the Atari, especially if you had a tape drive) and then 6502 assembler. I even聽managed to create a simple game in the assembler (I was 13 or so).

Then came the PC. I had a 80486DX2 in 1995, which was a powerful machine then. Of course first steps were games (ID’s Doom and Doom 2, to be exact), but since I had only 4 MB of RAM (yes, memory was then counted in megabytes), I needed to squeeze every possible free byte in order to make the games run. This led me to batch files and multi-boot on MS-DOS 6.22 and further configuration topics like setting up EMS/XMS, UMB聽(anyone remembers those?). Then I came basic to Basic – Microsoft QBasic, which was included in MS-DOS – and tried to create a playable game. After all, QBasic turned out not to be the most friendly and performing and I got my hands on C handbook by Kerrigan/Richie. Using this and Borland C++ 3.1 compiler I made my first steps into programming. It was 1996.

Then I discovered 80386 assembler – I tried to code some graphic effects using it (up to some success, even). I ended up with almost complete聽River Raid clone with animations done in back buffer, most of it coded in C and graphic library in assembler. After that, in summer 1997 I got a quick job by an acquaintance for developing a search engine for website using Perl and text files (no one ever thought it’s NoSQL back then, heh). I saw Windows NT 4.0 and IIS for the first time.

Then I went off the grid completely using computer only for recreational purposes until I started a computer science course on university in 2002. I had contact with many techs (most of them obsolete now, as they were those days as well) – Pascal, C/C++, C#, Java, sockets, assembler, Ada, SQL, even Novell Netware. In the mean time I picked up PHP, JavaScript and some Python, had some XML experience (XSD/XSLT, XQuery, XPath). This list will wrap up my last five years (more or less since I started my professional career):

  • server operating systems – everything Windows – Servers 2003/2008 (with or without R2) and 2012. I didn’t have much experience with NT/2000 though. I had an episode with Linux – Debian 4 – it did work, but that’s all.
  • client operating systems – mostly Windows – 2000/XP/7/8 (consciously skipping Vista). I also tried others – Ubuntu (didn’t like it) or Debian with聽a GUI (didn’t work for me).
  • virtualization:
    • platforms – Virtual Server 2005 R2 (yes, someone still uses it :P), Windows Server 2008 R2 and 2012 w/Hyper-V and Windows 8 with Hyper-V Client. Hyper-V is cool, especially in version 3.0, can’t argue with that.
    • systems – SCVMM 2008 R2/2012 overseeing a home lab consisting of two Hyper-V hosts; it’s more than enough for a proof of concept 馃檪
  • database servers – SQL Server from 2000 up and I feel best with 2005 or newer. I also tried Oracle 11 and have聽a prepared environment for playing with newest DB2 in home lab.
  • web聽servers:
    • IIS 6/7/7.5/8 – included in Server 2003/2008(R2)/2012. I successfully managed to make several classic ASP websites work with IIS 7.5, and it’s tough, IIS from 7 up brings many breaking changes to ASP, which need to be dealt with.
    • Apache 2.0/2.2 – did a pretty standard config, nothing out of ordinary. Most complex thing聽I did were rewrite rules.
  • application servers:
    • SharePoint – I know how to use it and I know how to install it. Installed WSS 3.0 on Windows Server 2003 R2 (that’s easy) and Windows Server 2008 R2 (that’s more difficult, since it’s not out-of-the-box), apart from that some lab installations of SharePoint Server 2010 and 2013. I might administer it but it would need a lot of motivation for me to think about becoming a SharePoint developer.
    • BizTalk – I tried to install 2010 in the lab but couldn’t make it work properly, still willing to give it another try, though 馃檪
    • Tomcat on Windows – installed it but had absolutely no clue what to do it what it after that, so tossed it away.
    • SAP聽Business Connector (rebranded WebMethods Integration Server) –聽I did almost everything with it (versions 4.7/4.8), from installation through administration to development. Good points – it’s Java聽and it聽works; bad – it’s Java and sometimes it doesn’t work.
  • scripting:
    • VBScript – just a few maintenance scripts
    • PowerShell – even more maintenance scripts, but I’m all in for it
  • SAP:
    • I can read ABAP and most certainly I would be able to write but I didn’t
    • I know my way around PP/MM modules
    • I know just a bit about WM module
  • programming:
    • I wrote a few useless apps in C# to help me in the office
    • I refreshed my knowledge of JavaScript by getting to know AJAX and jQuery
    • Even tried to do some stuff in VBA for Excel/Access

What about future? Software world is constantly evolving and it’s speeding up year after year. You have to learn new things if you want to keep up with it. My resolution for next year is the following:

  • have a basic experience with analytics as BI will be more important wig Big Data gaining more and more
  • keep interest in cloud solutions
  • try out some NoSQL (anyone can recommend a product?)
  • work out on application design, including scalability and performance