Late start into 2014

Long time no post. There’s a number of reasons for that, all of them “overridden” by my laziness. :) I had a strong resolution of keeping good progress with the blog, but reality proved otherwise. Anyway, I am coming back to blogging since there are some ideas started or good to be shared.

SQL Server

First of all, SQL Server 2014 is publicly available since April 2014, and it’s hot. I’m not willing to copy/paste official marketing statements, but In-Memory OLTP (former Hekaton) and buffer pool extensions seem very promising, especially now, when the hardware prices are declining and there is more capacity available (in terms of both hard drive and memory). I tried it at CTP2 stage and I’m going to return to RTM for more testing action – I still haven’t experienced it live in production.
I also have some drafts of XML-related posts – how to process XML in SQL Server and mostly how to extract data to XML. This topic was relevant to me at some point and I had problems finding good comparison with execution statistics. Since I’m revisiting this, I’ll wrap it up in a series of posts.

Oracle

With a change in my job, I’m getting to know Oracle. I’m responsible for deploying an application that is using Database Engine and is hosted on WebLogic Server (both of them 11g). This is a completely new experience for me, so it’s possible some posts related to those areas, for example transition from SQL Server to Oracle.

ALM

Earlier on this year, I challenged myself and passed three exams that are required for MCSD: Application Lifecycle Management. I am starting to think in terms of ALM more and more. Last year I made some experiments with TFS 2013 while it was still in Preview stage, focusing on Continuous Integration and Continuous Delivery. Some time passed – TFS 2013 not only went RTM but already got Update 2 – and it might be reasonable to come back to that as Microsoft is incrementally adding new releases to each update. Another point worth noting here is that Microsoft is giving a TFS to everyone for free at visualstudio.com, of course with limited functionality but even the free plan gives you unlimited projects for up to 5 users, 60 minutes of build and 15000 virtual user load minutes, so a great place to start.

Others

Phase change cooling of Celeron 430 on DFI DK P35 :) -50 Celsius on CPU done!


I came back actively to overclocking – I liked pushing an envelope a lot since my beginning with computers. I got myself a single stage phase change cooling unit (which is basically a stripped refrigerator – see the picture to the right) and a load of CPUs and RAM dating between 2002 and 2010. There’s going to be a lot of fun with that and for those who are interested you can follow me on hwbot.

And I decided to finally change the boring Andreas theme to something more modern. :)

Miscellaneous linked server creation – aliases, other database providers and Excel

After publishing recent posts regarding linked servers I noticed there are still some topics about linked servers that are interesting but not so straightforward. So, I decided to describe them details to cover these points fully. This will include a lot of messing with connection strings – I recommend using http://www.connectionstrings.com as a reference.

Here we go:

  1. Create linked server with different name and Windows Authentication.

  2. This will require using OLE DB provider for SQL Server – we will use a different name (alias) for the linked server and provide a full connection string like:

    Provider=SQLNCLI10;Data Source=myServerAddress;Initial Catalog=myDataBase;Integrated Security=True;
    

    Don’t forget to choose correct security option – I vote for number two or three! :)

  3. Create linked server with automatic failover (for use with AlwaysOn/database mirroring).

  4. This is an extension of an idea above – we will use OLE DB Provider, a different name (alias) for the linked server and provide a full connection string with definitione of failover partner like:

    Provider=SQLNCLI10;Data Source=myServerAddress;Initial Catalog=myDataBase;Integrated Security=True;Failover Partner=mirrorServer
    

    Adding a Failover Partner parameter to the connection string allows the client to attempt automatic reconnection in case of automatic failover, but you have to keep the mirror server up-to-date with server objects, such as logins or Agent jobs, otherwise your application might behave erratically.

  5. Create linked server to MySQL Server.

  6. This one is more tricky. There is no official OLE DB driver for MySQL, so we have to use ODBC instead. It has to be downloaded from MySQL site, installed and then we need to define system ODBC before we can connect to MySQL. After that it’s just a matter of providing the right connection string, for example:

    Driver={MySQL ODBC 5.1 Driver};Server=myServerAddress;Database=myDataBase;User=myUsername;Password=myPassword;Option=3;
    

    Let’s try it. Installation of the driver is straightforward, so the fun starts with system ODBC creation. I have MySQL server ready and waiting at mysql5.beton.test with one database and a table inside that contains two records:

    mysql> use beton
    Database changed
    mysql> select * from table1;
    +----------+-----------+
    | idTable1 | table1col |
    +----------+-----------+
    |        1 | aaaaa     |
    |        2 | acbd      |
    +----------+-----------+
    2 rows in set (0.00 sec)
    mysql>
    

    First I have to create ODBC data source (through Administrative Tools -> ODBC Data Sources): MySQL-ODBC-Step2MySQL-ODBC-Step1

    - since I am on 64-bit system I am using 64-bit driver – you might check if you need 32-bit or 64-bit;
    – I set up System DSN so that SQL Server can use it;
    – I choose Unicode driver and then I provide connection details – data source name, description, target server and port, user and password to connect to MySQL server, and startup database;
    – upon test connection it says “Connection successful”.

    After that initial setup I can proceed with linked server creation in SQL Server. In SSMS, I go to New Linked Server and provide following parameters:

    - Linked Server Name: MYSQL5 (it’s an alias I want to use);
    – Provider – Microsoft OLE DB Provider for ODBC Drivers;
    – Data Source – target server name, mysql5.beton.test in my case;
    – test connection – it should succeed at this point.

    mysql5_linked_serverFew notes about it:

    1. I explicitly mentioned user/password in the connection string. This causes all connections to be done in the context of this user. I couldn’t manage to get the connection working when using user mapping or domain accounts (might be related to ODBC setup). I’ll work on that but I am also open to any ideas about how to get it done.
    2. This does not allow for browsing the catalog we are used to in case of remote SQL Server – you can see the tables, but you can’t simply do SELECT col1, col2 FROM table1. It is caused by incompatibility of metadata in information_schema, so you have to prepare your queries in MySQL and then use OPENQUERY, which has to be configured first via sp_configure:
    3. sp_configure 'ad hoc distributed queries', 1
      go
      reconfigure with override
      go
      

      Then you can query your MySQL server like

      select * from OPENQUERY([MYSQL5], 'select * from table1;')
      

      This is the warning light should start flashing. Since any user can use OPENQUERY and the permissions on the remote server are determined based on the ODBC/linked server definition, you really want to give the account you use to connect the least access rights possible, otherwise be prepared to face uncontrolled data modifications. Also, ad hoc distributed queries is a server setting – once you enable it, everyone can use it in all sorts of ways, so keep it in mind if you want to maintain control over your data (you can also try to use “Disallow ad hoc access” option of linked server provider, but remember that it affects all linked servers using this provider).

      In both cases of PostgreSQL and MySQL, you don’t see the structure on remote database in Object Explorer until you enable ‘Allow inprocess‘. If this option is disabled though, it should protect SQL Server from errors caused by the provider. However, I observed strange behavior of crash of SQL Server when trying to use ODBC linked server provider with “Allow inprocess” option disabled. Steps to reproduce:

      1. Install ODBC driver – I used official drivers (64-bit version) 5.02.05.00 for MySQL and 9.02.01.00 for PostgreSQL.
      2. Set up a System DSN and confirm it works successfully using test connection option.
      3. In SSMS, create a linked server for MySQL/PostgreSQL. Check that “Allow inprocess” option is unchecked in MSDASQL provider properties.
      4. Open new query window and run query similar to SELECT * FROM OPENQUERY(linked_server, ‘SELECT * FROM table’), where linked_server is the name of newly created linked server, and table is any table. this will give error like:
      5. Msg 7399, Level 16, State 1, Line 8
        The OLE DB provider "MSDASQL" for linked server "MYSQL5" reported an error. Access denied.
        Msg 7350, Level 16, State 2, Line 8
        Cannot get the column information from OLE DB provider "MSDASQL" for linked server "MYSQL5".
      6. Run this query two more times – first gives exactly the same error, but after second SQL Server crashes:
        Faulting application name: sqlservr.exe, version: 2011.110.3000.0, time stamp: 0x5082086a
        Faulting module name: ntdll.dll, version: 6.2.9200.16384, time stamp: 0x5010acd2
        Exception code: 0xc0000374
        Fault offset: 0x00000000000ea2b9
        Faulting process id: 0xdbc
        Faulting application start time: 0x01cec279fec9e95b
        Faulting application path: C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\sqlservr.exe
        Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
        Report Id: 665792a8-2e6d-11e3-93fd-00155d040513
        Faulting package full name: 
        Faulting package-relative application ID: 
        
      7. I experienced this on SQL Server 2012 SP1 builds 3000 and 3128, but I suspect it’s a bug in 2012 SP1 as I haven’t found this mentioned in changelists of all 2012 SP1 CUs. I also found similar topic on SQLServerCentral forums which also mentions build 3128. At first I suspected GDR update 3128 to be the cause, but after uninstalling it and going back to 2012 SP1 RTM the issue persists.
      8. The workaround is to enable “Allow inprocess” on MSDASQL linked server provider – after enabling it all those problems go away.

  7. Create linked server to PostgreSQL Server.

  8. Linking to PostgreSQL is similar to MySQL, first you have to install ODBC driver for PostgreSQL (available here), set up system ODBC and create linked server in reference to ODBC definition. Then query a linked server using OPENQUERY or four-part name.

  9. Create linked server to Excel/Access.

  10. There is a comprehensive guide to linking to Excel/Access available here. In short, you have to install data provider (Microsoft.Jet.OLEDB.4.0 for Excel up to 2003 or Microsoft.ACE.OLEDB.12.0 for 2007 or newer) and then use it when defining new linked server. Sample T-SQL code:

    EXEC sp_addlinkedserver
        @server = 'ExcelServer2',
        @srvproduct = 'Excel', 
        @provider = data_provider_name,
        @datasrc = filename,
        @provstr = 'excel_version;IMEX=1;HDR=YES;'
    

    In the pseudocode above, data_provider_name is the provider to be used (based on Excel/Access version), excel_version is Excel 8.0 for 2003 and Excel 12.0 for 2007 and above. For more details I recommend visiting mentioned link.

Summary – in this post I described how to use aliases for linked SQL Servers, how to set up a linked server for MySQL/PostgreSQL and Excel. I also showed an error I spotted for ODBC linked server provider in SQL Server 2012 SP1. As usual, I await comments/questions.

SQL SERVER – How to Access the Previous Row and Next Row value in SELECT statement? – from Pinal Dave

Szymon Wójcik:

After reading forementioned article by Pinal Dave I wanted to compare those methods and I even managed to make the CTE version to run in acceptable time – around 0.5 seconds on 20k records in AW2012 Person table (the original runs over 95 seconds). The trick is to use numbers table – I didn’t bother creating a new one, so I used another CTE that created 1 million of numbers.
Worth noting is the fact that it still is nowhere near the performance of LAG/LEAD functions in 2012, mostly due to I/O required.

with Numbers as
 (
 select (v1.number * 1024) + v2.number as Num
 from
 master.dbo.spt_values v1
 inner join master.dbo.spt_values v2 on v1.type = v2.type
 where v1.type = 'P' and v1.number < 1024 and v2.number < 1024
 ),
 ThisPrevNext
 as
 (select Num - 1 Prev, Num, Num + 1 Nex
 from Numbers),
 CTE AS (
 SELECT
 rownum = ROW_NUMBER() OVER (ORDER BY p.BusinessEntityID),
 p.FirstName
 FROM Person.Person p)
 select th1.FirstName, th2.FirstName, th3.FirstName
 from ThisPrevNext tpn
 left join CTE th1 on th1.rownum = tpn.Prev
 left join CTE th2 on th2.rownum = tpn.Num
 left join CTE th3 on th3.rownum = tpn.Nex
 where tpn.Num between 1 and (select count(*) from CTE)

The reason for that is that CTE version has to scan underlying table several times, while LEAD/LAG do it in one pass. While the difference is negligible in case of 20k records (on my test VM it’s 172 ms vs 533 ms), it will be clearly visible when the table is larger. I copied Person.Person table into a new table and duplicated it several times so that it contains 639k records.

The results are as follows:

  1. LEAD/LAG – ~4 seconds, 18630 reads
  2. CTE with numbers table – ~35 seconds, 59192 reads
  3. original CTE – after 50 minutes I cancelled the query as it was not yet 20% done :)

The outcome is to use LEAD/LAG where possible – if you consider upgrading to 2012 and require such functionality, this might be one of your arguments.

Originally posted on Journey to SQL Authority with Pinal Dave:

The first email I read this morning had only one statement in it, and it gave me an idea to write this blog post.

“How to access Previous Row Value and Next Row Value in SELECT statement?”

Very interesting question indeed. The matter of the fact, I have written about this in my earlier blog Introduction to LEAD and LAG – Analytic Functions Introduced in SQL Server 2012. Let us quickly understand it in it with the help of script.

For example here is the column where we want to get access to the previous row and the next row in SELECT statement.

USE AdventureWorks2012
GO
SELECT p.FirstName
FROM Person.Person p
ORDER BY p.BusinessEntityID
GO

Now let us use the new Analytic Function from SQL Server 2012 to get previous and next row values.

SELECT
LAG(p.FirstName) OVER (ORDER BY p.BusinessEntityID) PreviousValue,
p.FirstName,
LEAD(p.FirstName

View original 148 more words

Linked servers’ antipatterns

After previous post about using Windows Authentication for linked servers I realized there are more common points with regards to linked servers that might be identified as antipatterns. I’ll describe those together with suggestions of solving them.

  1. Incorrect security settings.

    This situation takes place often when ad-hoc connection is made to use remote data for example for reporting purposes (which is not available locally). It involves the following:

    • Security option is set to “Be made using this security context” and remote login has a lot of privileges – like db_owner in some databases or even sysadmin. This might allow a user to elevate his permissions on remote server and perform action he is not intended to do.
    • Security option is set to “Be made without using security context” and remote server allows anonymous logins to perform some actions. This will also lead to users performing actions without being audited. It also requires remote server to have guest accounts enabled or include some securables into public role in some databases that will cause uncontrolled access to these databases.

    The solution for that is to follow Principle of Least Privilege – ensure only required access is granted on the remote server. I recommend either using security option 3 “Be made using the login’s security context” or creating a dedicated login and granting it minimum rights together with option 4 (downside of latter method is that it grants exactly the same rights on remote server regardless of login used on local server).

  2. Joins of two linked server queries.

    I noticed this in reporting queries that had to aggregate data from several data sources and the idea was to use query like (it’s executed on Server1):

    select columns
    from
    Server1.DB1.dbo.Table1 t1
    join
    Server1.DB1.dbo.Table2 t2
    join
    Server2.DB1.dbo.Table1 t3
    join
    Server3.DB1.dbo.Table1 t4
    ...
    

    There are several problems with this query:

    • This query requires a lot of tempdb resources on Server1 – Server2 and Server3 only provide data, but the actual execution takes place on Server1. So before the join is executed, data must be transferred from remote servers and placed in temporary tables.
    • Following from the point above – network might be a bottleneck if volume of data to be transferred is huge.
    • Even if remote databases are optimized and properly indexed, since query is executed in Server1 it’s nearly impossible to tune it as it uses temporary worktables in tempdb.
    • Based on generated execution plan, it is also possible that selected physical join operator is nested loop join and it will trigger a number of queries to be executed on remote server equal to row count in the pipeline.

    As the queries like that are mostly used for reporting purposes, the perfect solution would be to split the query into several datasets and let it be processed on the client side. I’m not aware of such feature in Reporting Services up to 2012 version. My workaround for this case involves one of two options:

    • Use OPENQUERY to group tables by remote server, thus limiting a number of resources (remote server calls, tempdb resources, pipeline objects etc.) needed to execute the query. It will require a lot of rework on the query, though.
    • Set up replication or AlwaysOn readable secondary to consolidate data on single machine.
  3. Linked server chaining.

    This takes place when server A connects to server B, which in turn connects to server C, effectively allowing users on server A to retrieve data from server C. My suggestion in this case is to avoid proliferation of linked servers and setting up replication or AlwaysOn instead. This will ensure better control over data access.

This is a third post in a row about linked servers and I am out of ideas regarding those for now. However, if there’s something valid or requiring correction, or maybe you’d like to ask something, please let me know.

Remote stored procedure execution through linked server

RPCOutI am recently doing a lot of experiments with linked servers. Last one included enabling remote stored procedure call and it’s really quick – it only requires RPC Out to be set to true in the linked server definition (on the right). You can also use sp_serveroption to set this value.

As it turns out RPC setting does not do anything – see for example here. Remember about granting rights on the remote server – giving too much access may lead to data leaks or uncontrolled data modifications on the remote machine. As usual, we want to have as limited access to remote resources as possible.

I also observed strange behaviour – I used two servers SQL12T1 and SQL12T2 in this demo. I first created linked server on T1 pointing to T2 using SQL Server provider and then I tried to created another one on T2 which points to T1. However, I used default provider (Microsoft OLE DB Provider for SQL Server) without specifying any data connection and my newly created linked server pointed to localhost (take a look below – at first I thought that T1 is redirecting me to T2 somehow, but then it became clear to me it’s connecting to localhost). Keep an eye on it – make sure that you choose SQL Server if you want to input server name only or provide valid connection string for OLE DB.

SQL12T1_2 and SQL12T2

Setting up linked servers using Windows Authentication and how to solve multiple hop problem

This post will cover a topic of connecting linked servers using Windows Authentication. Linked server is a hot topic since longer than I am able to imagine (namely SQL Server 7.0) but it needs a bit of attention to understand it properly. The focus will be on setting up linked server to another SQL Server for use of Windows Authentication.

Generally, a linked server is a server object (on machine A) that allows you to work with information from a remote data source (call it B for the sake of clarity). This scenario assumes that both database servers A and B belong to your domain. Now, in the simplest scenario you might be tempted to set up a SQL login on remote server (B) with access rights to required data and create linked server on A using security option #4 “Be made using this security context” providing newly created login and password.

Done? Done.

Wrong.

First of all, you created an enormous, gigantic security hole allowing all users of server A to use the data from server B based on access rights of SQL login. If the login has write access, you are even able to modify remote data.

Second, you anonimized access to server B – all connections from server A will use your created login. If you need to check or log access, you will lose this part of information.

You think about it and decide to correct it. There are two options for consideration:

  • Not be made – if you do not define login mapping explicitly, connection will fail. See note about login mapping below.
  • Be made using the login’s current security context – you will use current login credentials. This needs to be validated when data on server B is sensitive – you might not want this data to be available on server A through linked server even if the user has access to it on server B.

Regarding login mapping – you may provide local logins which are ONLY representing users. You cannot add groups, but specify one-to-one local-to-remote login mapping. This might require additional work to set up and maintain properly – imagine you need to provide access through linked server to 50 people and you have not only to create 50 local logins, but remember to update linked server mapping definition.

You might have already started to consider Windows Authentication. That’s a good approach, but will require additional points to consider:

  • In case of linked server permission chaining does not occur. You have to set up access to remote resources, including creating logins and mapping them to users.
  • In order for delegation to work, Active Directory service accounts for SQL Server instances must be enabled for delegation, otherwise “hop problem” will occur. It also requires that SPNs (Service Principal Names) are set up properly in Active Directory for SQL Server service account.

Let’s have an example for that. We need three SQL Servers in a domain – I am using SQL Server 2012 SP1 for this demo – but it will work with previous versions as well. My domain is called beton.test and database servers are SQL12T1, SQL12T2 and SQL12T3. Each of the database servers runs under separate domain account:

  • SQL12T1 under BETON\sql12t1srv
  • SQL12T2 under BETON\sql12t2srv
  • SQL12T3 under BETON\sql12t3srv

As a preparation I also created following objects:

  • On SQL12T1:
    • a database named BazaS1
    • a table in database BazaS1 called B1T1
    • a linked server to SQL12T2 without mappings and security settings set to “Be made using the login’s current security context”.
  • On SQL12T2:
    • a database named BazaS2
    • a table in database BazaS2 called B2T1
    • a linked server to SQL12T3 without mappings and security settings set to “Be made using the login’s current security context”.
    • a view in database BazaS2 called S3LinkedView which joins data from SQL12T2 and SQL12T3
  • On SQL12T3:
    • a database named BazaS3
    • a table in database BazaS3 called B3T1
  • I also need a separate Active Directory group that will be a db_owner of each databases (SQL Server Group 1 for SQL12T1, SQL Server Group 2 for SQL12T2 and SQL Server Group 3 for SQL12T3). I do it for the sake of simplicity – in the real world you might assign more strict permission to the groups. As I’m connecting to SQLT2T1 I also have to grant access for Group 1 to databases on SQL12T2 and SQL12T3 as well as for Group 2 to databases on SQL12T3.

The goal is to retrieve data from all three servers by executing a query on SQL12T1 that will join data from table in BazaS1 with view in BazaS2 on SQL12T2 (that will return processed data from both SQL12T2 and SQL12T3). Here are scripts to create databases with some test data (click to expand script source code):

USE [master]
GO

CREATE DATABASE [BazaS1]
 CONTAINMENT = NONE
 ON  PRIMARY
( NAME = N'BazaS1', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\BazaS1.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON
( NAME = N'BazaS1_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\BazaS1_log.ldf' , SIZE = 1536KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO

ALTER DATABASE [BazaS1] SET COMPATIBILITY_LEVEL = 110
GO

USE [BazaS1]
GO

CREATE USER [BETON\SQL Server Group 1]
GO
ALTER ROLE [db_owner] ADD MEMBER [BETON\SQL Server Group 1]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[B1T1](
	[a] [int] NULL,
	[f] [float] NULL
) ON [PRIMARY]

GO
INSERT [dbo].[B1T1] ([a], [f]) VALUES (1, 1)
GO
INSERT [dbo].[B1T1] ([a], [f]) VALUES (2, 5)
GO
INSERT [dbo].[B1T1] ([a], [f]) VALUES (3, -2.5)
GO

USE [master]
GO

/****** Object:  LinkedServer [SQL12T2]    Script Date: 9/3/2013 6:15:27 AM ******/
EXEC master.dbo.sp_addlinkedserver @server = N'SQL12T2', @srvproduct=N'SQL Server'
 /* For security reasons the linked server remote logins password is changed with ######## */
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'SQL12T2',@useself=N'True',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL

GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'collation compatible', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'data access', @optvalue=N'true'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'dist', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'pub', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'rpc', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'rpc out', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'sub', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'connect timeout', @optvalue=N'0'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'collation name', @optvalue=null
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'lazy schema validation', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'query timeout', @optvalue=N'0'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'use remote collation', @optvalue=N'true'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T2', @optname=N'remote proc transaction promotion', @optvalue=N'true'
GO
USE [master]
GO

CREATE DATABASE [BazaS2]
 CONTAINMENT = NONE
 ON  PRIMARY
( NAME = N'BazaS2', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\BazaS2.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON
( NAME = N'BazaS2_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\BazaS2_log.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO

ALTER DATABASE [BazaS2] SET COMPATIBILITY_LEVEL = 110
GO

USE [BazaS2]
GO

CREATE USER [BETON\SQL Server Group 1] FOR LOGIN [BETON\SQL Server Group 1]
GO

CREATE USER [BETON\SQL Server Group 2] FOR LOGIN [BETON\SQL Server Group 2]
GO
ALTER ROLE [db_owner] ADD MEMBER [BETON\SQL Server Group 1]
GO
ALTER ROLE [db_owner] ADD MEMBER [BETON\SQL Server Group 2]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[B2T1](
	[a] [int] NULL,
	[s] [varchar](50) NULL
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create view [dbo].[S3LinkedView] as
select t.*, t2.d from B2T1 t inner join [SQL12T3].BazaS3.dbo.B3T1 t2 on t.a = t2.a

GO
INSERT [dbo].[B2T1] ([a], [s]) VALUES (1, N'horror')
GO
INSERT [dbo].[B2T1] ([a], [s]) VALUES (2, N'error')
GO
INSERT [dbo].[B2T1] ([a], [s]) VALUES (4, N'terror')
GO

USE [master]
GO

/****** Object:  LinkedServer [SQL12T3]    Script Date: 9/3/2013 6:20:40 AM ******/
EXEC master.dbo.sp_addlinkedserver @server = N'SQL12T3', @srvproduct=N'SQL Server'
 /* For security reasons the linked server remote logins password is changed with ######## */
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'SQL12T3',@useself=N'True',@locallogin=NULL,@rmtuser=NULL,@rmtpassword=NULL

GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'collation compatible', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'data access', @optvalue=N'true'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'dist', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'pub', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'rpc', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'rpc out', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'sub', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'connect timeout', @optvalue=N'0'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'collation name', @optvalue=null
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'lazy schema validation', @optvalue=N'false'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'query timeout', @optvalue=N'0'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'use remote collation', @optvalue=N'true'
GO

EXEC master.dbo.sp_serveroption @server=N'SQL12T3', @optname=N'remote proc transaction promotion', @optvalue=N'true'
GO
USE [master]
GO

/****** Object:  Database [BazaS3]    Script Date: 9/3/2013 6:22:06 AM ******/
CREATE DATABASE [BazaS3]
 CONTAINMENT = NONE
 ON  PRIMARY
( NAME = N'BazaS3', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\BazaS3.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON
( NAME = N'BazaS3_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\BazaS3_log.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO

ALTER DATABASE [BazaS3] SET COMPATIBILITY_LEVEL = 110
GO

USE [BazaS3]
GO

CREATE USER [BETON\SQL Server Group 1] FOR LOGIN [BETON\SQL Server Group 1]
GO
CREATE USER [BETON\SQL Server Group 3] FOR LOGIN [BETON\SQL Server Group 3]
GO

ALTER ROLE [db_owner] ADD MEMBER [BETON\SQL Server Group 1]
GO
ALTER ROLE [db_owner] ADD MEMBER [BETON\SQL Server Group 3]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[B3T1](
	[a] [int] NULL,
	[d] [smalldatetime] NULL
) ON [PRIMARY]

GO
INSERT [dbo].[B3T1] ([a], [d]) VALUES (1, CAST(0xA1D40561 AS SmallDateTime))
GO
INSERT [dbo].[B3T1] ([a], [d]) VALUES (2, CAST(0xA1D30561 AS SmallDateTime))
GO
INSERT [dbo].[B3T1] ([a], [d]) VALUES (5, CAST(0xE92A0561 AS SmallDateTime))
GO

A little explanation – each of the tables will contain three records – one holds a floating-point value, second – a string and third – a date/time value. Each of has a candidate key of a on which a join will be performed. The view returns two rows joining a string (from SQL12T2.BazaS2.dbo.B2T1 table) with datetime (from SQL12T3.BazaS3.dbo.B3T1 table).
I am going to execute query below which is supposed to return three rows:

select t1.a, t1.f, t2.s, t2.d
from BazaS1.dbo.B1T1 t1
left join [SQL12T2].BazaS2.dbo.S3LinkedView t2
on t1.a = t2.a

The expected result is as follows:

linked_server_results

 

 

 

 

After going through initial setup – creating AD accounts/groups, creating databases/tables, creating logins and setting up permissions – the most it’s possible to be done on SQL12T1 is to join a table BazaS1.B1T1 on SQL12T1 with BazaS2.B2T1 on SQL12T2. However, this might fail if the service accounts of SQL Servers are not set properly in AD. To solve this, first check the SPN using

setspn -L BETON\sql12t1srv (or whatever service account is used on SQL12T1)

This command should get a list of registered SPNs for the service account. It should contain SPN for SQL Server service in format:

MSSQLSVC/Server_FQDN:port

If it’s missing, you should set correct SPN using following syntax:

setspn -S MSSQLSVC/Server_FQDN:port service_account

In this case this will be:

setspn -S MSSQLSVC/sql12t1.beton.test:1433 BETON\sql12t1srv

Important – after making changes to SPNs respective services must be restarted in order to take effect. This has to be taken into consideration while planning changes to production environments.

After SPNs are set properly linked servers should start working, however it’s still not possible to do a second hop while querying a view on SQL12T2 from SQL12T1 (remember that a view joins data from SQL12T3). The error message will say that login failed for “NT AUTHORITY\ANONYMOUS LOGON”. The solution for that is to set up a delegation in Active Directory for SQL Server service accounts – see the screenshot for details:
delegation_settings

 

 

 

 

 

 

 

The idea behind this is simple – you delegate from sql12t1srv account to SQL Server service on SQL12T2 and from sql12t2srv account to SQL Server service on SQL12T3. After that restart the services once again so that the delegation kicks in and finally it starts working, the query retrieves expected results.

Wrapping this all up – we discussed a few points regarding linked servers, made a point about Windows Authentication and then had an example how to configure security properly in order to fix double hop problem. As usual, I await your questions and suggestions.

Multiple UNPIVOTs versus CROSS APPLY analyzed

Few days ago I posted a bookmark to a blog post where an example of multiple UNPIVOTs was given. I stated that such scenario – UNPIVOT of multiple groups which contain related data – is not optimal and CROSS APPLY should be preferred.

To demonstrate this, let’s use the example mentioned in the original post and compare execution plans and execution statistics. First data sample will contain three rows, second – 96 rows and third one – 3072. I am testing it on SQL Server 2012 SP1 in a 4-core VM at home lab.

The queries to be compared are:

SELECT Id,
ROW_NUMBER()OVER(Order By ID) as NewID,
Employee,
Company,
City
FROM
( SELECT Id, Employee, Company1, Company2, Company3, City1, City2, City3  FROM EmploymentHistoryWithCity ) Main
UNPIVOT ( Company FOR companies IN (Company1, Company2, Company3) ) Sup
UNPIVOT ( City For Cities IN (City1, City2, City3 ) ) Ct
WHERE RIGHT(companies,1) = RIGHT(Cities,1)
SELECT Id,
ROW_NUMBER()OVER(Order By ID) as NewID,
Employee,
Company,
City
from  EmploymentHistoryWithCity
CROSS APPLY (VALUES([Company1], [City1]), ([Company2], [City2]), ([Company3], [City3])) u(Company, City)

With just three records in source table the execution statistics are just about the same, with difference being just about the error margin. Looking at the execution plans we spot a first difference (top – original UNPIVOT, bottom – CROSS APPLY):

cross apply vs unpivot_3 rows

Since we use UNPIVOT twice, a join must be performed twice using Nested Loops operator whereas CROSS APPLY requires only one join. Additionally filtering must be done in UNPIVOT case to eliminate extra rows from being generated and it might have an impact on performance as we’re using RIGHT(…) = RIGHT(…) condition which will have to be evaluated row-by-row. Let’s add more rows to the table to see how it affects the query execution.

Using following query will create 93 more rows:

INSERT INTO EmploymentHistoryWithCity
SELECT
[Employee]
,[Company1]
,[Company2]
,[Company3]
,[City1]
,[City2]
,[City3]
FROM [dbo].[EmploymentHistoryWithCity]
GO 5

The execution takes a little bit longer but both queries are still on par with each other:

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(288 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 106 ms.

(288 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 107 ms.

The execution plans start to show a little difference:

cross apply vs unpivot_96 rows

The second join in UNPIVOT query costs about 15% of total query cost making for the difference between queries. Also second filter comes into play – with 96 rows first join puts 288 rows in the pipeline, but second join multiplies it by three again and makes it 864 rows that need to be scanned. Again, let’s add more rows and see what happens – after using the query above row count goes from 96 to 3072.

Statistics first:

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(9216 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 339 ms.

(9216 row(s) affected)
Table 'EmploymentHistoryWithCity'. Scan count 1, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 159 ms.

And the execution plan:

cross apply vs unpivot_3072 rows

The difference is now clearly visible. The second Nested Loops operator has dominated the query – it has to be executed for every row returned from first UNPIVOT. We might help the optimizer a bit at this point and create a clustered index on primary key on ID column and then try to add indexes on Company* and City* fields to try to help the UNPIVOT operator.

The creation of clustered index makes the difference even more visible. The statistics show that in this case UNPIVOT requires 303 ms, while CROSS APPLY will make it in 199 ms. But the execution plan shows 75% – 25%. That’s because data extraction from the table cost less, but the whole join operation will be unaltered. This way we only made cheaper the part of the query that was already quick:

cross apply vs unpivot_3072 rows_clustered index

Adding nonclustered indexes on Company and City fields does not help either. This is because both join will have to be done in memory and they have to be nested loops to secure UNPIVOT functionality – you have to scan the values in order to be able to output the correct result.

Summarizing – if you have groups of columns that need to be UNPIVOTed together (being connected with a sort of join dependency) use CROSS APPLY with VALUES to do it. The other question is a design of such table – this is not in line with fifth normal form and should be normalized into two tables (if required). This will allow for more optimal resource usage.