Linked servers’ antipatterns

After previous post about using Windows Authentication for linked servers I realized there are more common points with regards to linked servers that might be identified as antipatterns. I’ll describe those together with suggestions of solving them.

  1. Incorrect security settings.

    This situation takes place often when ad-hoc connection is made to use remote data for example for reporting purposes (which is not available locally). It involves the following:

    • Security option is set to “Be made using this security context” and remote login has a lot of privileges – like db_owner in some databases or even sysadmin. This might allow a user to elevate his permissions on remote server and perform action he is not intended to do.
    • Security option is set to “Be made without using security context” and remote server allows anonymous logins to perform some actions. This will also lead to users performing actions without being audited. It also requires remote server to have guest accounts enabled or include some securables into public role in some databases that will cause uncontrolled access to these databases.

    The solution for that is to follow Principle of Least Privilege – ensure only required access is granted on the remote server. I recommend either using security option 3 “Be made using the login’s security context” or creating a dedicated login and granting it minimum rights together with option 4 (downside of latter method is that it grants exactly the same rights on remote server regardless of login used on local server).

  2. Joins of two linked server queries.

    I noticed this in reporting queries that had to aggregate data from several data sources and the idea was to use query like (it’s executed on Server1):

    select columns
    from
    Server1.DB1.dbo.Table1 t1
    join
    Server1.DB1.dbo.Table2 t2
    join
    Server2.DB1.dbo.Table1 t3
    join
    Server3.DB1.dbo.Table1 t4
    ...
    

    There are several problems with this query:

    • This query requires a lot of tempdb resources on Server1 – Server2 and Server3 only provide data, but the actual execution takes place on Server1. So before the join is executed, data must be transferred from remote servers and placed in temporary tables.
    • Following from the point above – network might be a bottleneck if volume of data to be transferred is huge.
    • Even if remote databases are optimized and properly indexed, since query is executed in Server1 it’s nearly impossible to tune it as it uses temporary worktables in tempdb.
    • Based on generated execution plan, it is also possible that selected physical join operator is nested loop join and it will trigger a number of queries to be executed on remote server equal to row count in the pipeline.

    As the queries like that are mostly used for reporting purposes, the perfect solution would be to split the query into several datasets and let it be processed on the client side. I’m not aware of such feature in Reporting Services up to 2012 version. My workaround for this case involves one of two options:

    • Use OPENQUERY to group tables by remote server, thus limiting a number of resources (remote server calls, tempdb resources, pipeline objects etc.) needed to execute the query. It will require a lot of rework on the query, though.
    • Set up replication or AlwaysOn readable secondary to consolidate data on single machine.
  3. Linked server chaining.

    This takes place when server A connects to server B, which in turn connects to server C, effectively allowing users on server A to retrieve data from server C. My suggestion in this case is to avoid proliferation of linked servers and setting up replication or AlwaysOn instead. This will ensure better control over data access.

This is a third post in a row about linked servers and I am out of ideas regarding those for now. However, if there’s something valid or requiring correction, or maybe you’d like to ask something, please let me know.

Advertisements

[Story] Replication from remote site

Welcome to a next part of my DBA stories. This time I’m going to tell the story about replication problems I have experienced recently.

The task was pretty standard – set up a transactional replication of a database (size of few gigabytes) from a remote site to central office using our WAN and schedule it to synchronize at night. At first it seemed easy, there was one week deadline for it, the database had already been replicated in the past, so I thought it’s not a big deal.

First it turned out that about a half out of 300 tables in the database does not have a primary key. I had to contact development team to solve it – it appeared that it was caused by change of collation on a database which they had requested before! Luckily, some old copies existed so they could be scavenged for definitions of missing objects.

After fixing missing primary keys I proceeded with setting up publication. I chose everything to be published (there were no filtering requirements), generated the snapshot but then it couldn’t be applied at the Subscriber! I wasted two days (since I ways trying to get it done in line with assumed schedule) for trying to get a snapshot applied. After research I found that views were causing all the mess.

Two of the views were referencing linked servers which weren’t defined at the Subscriber.

One of the views was referencing a table in another database which existed at the Publisher, but not at the Subscriber.

One of the views was actually a disaster. I spent over an hour trying to get past it, until I saw it last. The script was like that:

CREATE VIEW viewName
AS

/* CREATE TABLE someTable
( column1,
column2,
...
columnN)
GO

*/

... actual view definition

The message in the Replication Monitor was “unable to find closing comment mark ‘*/'”. And after some it hit me – GO! Seeing all the issues and knowing that I already missed the deadline I decided to remove the views from the publication and recreate them at the Subscriber, if necessary.

So then the snapshot was OK, so I started synchronizing it during the day. But what happened was that network bandwidth was maxed, so literally al network traffic to remote site slowed to a crawl. The snapshot was applied in 14 hours, but now everything is back to normal. The correct schedule is applied, but I think I have some more grey hair due to this dam replication. It’s not my favourite high-availability technology.

[Story] Terrible performance problem

Hello again and welcome to new series categorized informally as “DBA stories”. This is what it means – stories from real life depicting strange, bizarre, sometimes funny (sometimes not) but somehow interesting to a DBA. The choice of them is arbitrary – as long as I have my stories, I will try not to include stories posted by someone else. And remember, these things happened, so they might as well happen to you!

OK, so for the beginning a situation that made me stay up whole night once. I get a phone call from help desk agent around 11pm saying that one of mission-critical systems suffers from a poor performance. OK, I say, let me see – I log on and there it is, CPU load 100% with SQL Server barely responding due to load. The system uses transactional replication to maintain a warm-standby and I started thinking of invoking business continuity procedures. Then I saw that a 10 GB database has 40 GB of log which is growing by 1 GB every half an hour. So, it’s a database problem and I start digging.

Service restart for SQL Server didn’t help, of course – it didn’t remove the source of the problem. So I started looking for the source with Profiler and then I found it.

The clue was – one of the components received a patch with a new functionality, which – from my (DBA) view – took all records in one of the tables and in a loop updated all records in each iteration (for each record there was UPDATE issued without WHERE). There were like 5000 records in this table, so each execution of this procedure updated ~25 million records, and that’s not the end of it. This procedure was run once a minute! That’s why the log was growing so fast and replication could not simply keep it up as well.

Once the problem was found, one quick call to application support team with request to disable this functionality solved it immediately. How did it pass QA? you might ask. Well, there are two morals:

  1. Keep your QA environment as similar to production as possible.
  2. Don’t trust your developers – have it checked two or three times.