Yet another SQL Server DBA…

Just another SQL Server DBA blog

Archive for September, 2011

SQL Server Hardware Choices Made Easy by Glenn Berry – a quick review

Posted by Szymon Wójcik on September 25, 2011

I have come across Glenn Berry’s e-book by accident on twitter while ago by accident, downloaded it and then liked it very much. I have read it over one evening and first thing I did next day after arriving to work was forwarding it to all my coworkers.

What was the most interesting for me (as I share Glenn’s interest in hardware) was the detailed description of several interesting configurations using current hardware, as well as some future trends (like enterprise SSD drives, AMD Bulldozer or new Intel CPU). It just picks up where most of the books and guides leave you – for instance when most authors tell you to set up RAID-1 for logs and RAID-10 for database files, Glenn gives you an overview what actual models and what exact configuration may be applied and what is the cost of it. Furthermore, he shows what CPUs and in which configuration are preferred depending on your worklad (OLTP differs from OLAP). You will see that RAM is cheap and you don’t have to save on it. Last but not least, you will get a pocket-friendly suggestion on how to build a workstation for SQL Server.

If you have some second thoughts while picking up hardware for your new SQL Server, you have always bought 1-2-3-style machines (CPU/RAM/HDD count) or you’re simply interested in computer hardware and databases, this book is for you. Trust me, you won’t be disappointed.

Posted in SQL Server | Tagged: , , | Leave a Comment »

[Story] Awful scripting

Posted by Szymon Wójcik on September 19, 2011

It happened to me few years ago, while I still was more developer than a DBA. I needed to create a database in development environment with identical structure to the one in production. Easy thing, you say, have a "Generate Scripts” wizard do the job. Well, you’ll see.

By the way, I would already omit one crucial detail – it was 3pm on Friday afternoon.

Since I don’t have enough rights to do it on my own (SOX forbid!), I go with a registered support request to admin team asking for a favour. The beginning was simple – Generate Scripts, all objects, new query window, 500 objects, a minute and we’re done. OK, let’s run, says he and clicks ‘”Execute”.

OMG, I thought seeing all those red messages in the query pane. Unable to login.

In about half an hour users start calling that they are unable to continue production and a factory is going to stop.

We call a senior admin with all the information we have and he says “restore the master”. “What?!” we ask. So in about an hour he drops by (he was already home), brings down the server, runs it in a single-user mode and restores master database from warm standby to primary production server. It takes him half an hour, factory doesn’t stop, the world is saved and we (me and the other guy who clicked Execute) are laughed at for a year. Morals?

  1. Be careful what you wish (script) for – it turned out that we scripted DROP and CREATE, for unknown reason. Double and triple check.
  2. If you have a script, but you’re unsure of it’s behaviour, don’t ever test it on production.
  3. If you have a job to do and it’s Friday afternoon, ask yourself if it can wait until Monday morning – you might have a weekend off.
  4. Have limited trust in the junior guys – let someone more experienced supervise them a bit.

Posted in DBA stories | Tagged: , , | Leave a Comment »

My question at DBA Quiz

Posted by Szymon Wójcik on September 16, 2011

I was a little shocked when I got mail from Jacob Sebastian, who is a man behind beyondrelational.com, asking me to take part in DBA Quiz 2011 as an expert. I was flattered by the invitation, since I don’t consider myself an expert in SQL Server, so I tried hard to come up with something interesting.

At first I thought about index maintenance-related stuff (remember my posts about indexes?) but then I thought it’s already been explored so well since them fragmentation and missing index DMVs are with us since SQL Server 2005, so I started looking somewhere else. After two days of intensive thinking and reading I got idea – it will be storage-related. The only thing was to provide the example and the longest part was to get SQL Server 2008 R2 installed on my old laptop.

So there it is – first a little story, then a question itself. Imagine you’re a stubborn (or maybe creative :D ) DBA that does not take for granted all what you’re told. In the beginning of your career, you read that storage-wise you can get as deep as a filegroup when it comes to tables – you can only assign it to filegroup or partition. Now you want to show that it’s possible to find a data in a file.

You have a big database, with 3 filegroups, each of them consisting of 3 files. You create table t1 on first filegroup with two columns int and varchar(100) and t2 located on second filegroup which has only two columns of type varchar(20) and varchar(50). Both tables don’t have indexes. You insert into first table two records (1, ‘aaa’) and (666, ’666′), and insert into second table also two records (‘abcabc’, ‘abcdefghi’) and (‘defdef’, ‘qwerty’). What steps will you take to check in which file and on which position in this file record (‘defdef’, ‘qwerty’) resides?

As an answer, two things are required:

  1. File name and offset where the requested record is located.
  2. A procedure how it was obtained (stopping SQL Server and searching contents of files with text/hex editor will not be counted as solution).

Here’s the script to create required database and tables (you might have to adjust file paths):

CREATE DATABASE [TableLocateTest] ON  PRIMARY 
( NAME = N'TLT1', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT1.mdf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
 FILEGROUP [FG1] 
( NAME = N'TLT1_1', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT1_1.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
( NAME = N'TLT1_2', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT1_2.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
( NAME = N'TLT1_3', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT1_3.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
 FILEGROUP [FG2] 
( NAME = N'TLT2_1', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT2_1.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
( NAME = N'TLT2_2', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT2_2.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
( NAME = N'TLT2_3', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT2_3.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
 FILEGROUP [FG3] 
( NAME = N'TLT3_1', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT3_1.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
( NAME = N'TLT3_2', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT3_2.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB ), 
( NAME = N'TLT3_3', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT3_3.ndf' , SIZE = 3072KB , FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'TLT_log', FILENAME = N'G:\Microsoft SQL Server 2008 R2\MSSQL10_50.SS2K8R2I1\MSSQL\DATA\TLT_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)
GO
ALTER DATABASE [TableLocateTest] SET COMPATIBILITY_LEVEL = 100
GO
ALTER DATABASE [TableLocateTest] SET ANSI_NULL_DEFAULT OFF 
GO
ALTER DATABASE [TableLocateTest] SET ANSI_NULLS OFF 
GO
ALTER DATABASE [TableLocateTest] SET ANSI_PADDING OFF 
GO
ALTER DATABASE [TableLocateTest] SET ANSI_WARNINGS OFF 
GO
ALTER DATABASE [TableLocateTest] SET ARITHABORT OFF 
GO
ALTER DATABASE [TableLocateTest] SET AUTO_CLOSE OFF 
GO
ALTER DATABASE [TableLocateTest] SET AUTO_CREATE_STATISTICS ON 
GO
ALTER DATABASE [TableLocateTest] SET AUTO_SHRINK OFF 
GO
ALTER DATABASE [TableLocateTest] SET AUTO_UPDATE_STATISTICS ON 
GO
ALTER DATABASE [TableLocateTest] SET CURSOR_CLOSE_ON_COMMIT OFF 
GO
ALTER DATABASE [TableLocateTest] SET CURSOR_DEFAULT  GLOBAL 
GO
ALTER DATABASE [TableLocateTest] SET CONCAT_NULL_YIELDS_NULL OFF 
GO
ALTER DATABASE [TableLocateTest] SET NUMERIC_ROUNDABORT OFF 
GO
ALTER DATABASE [TableLocateTest] SET QUOTED_IDENTIFIER OFF 
GO
ALTER DATABASE [TableLocateTest] SET RECURSIVE_TRIGGERS OFF 
GO
ALTER DATABASE [TableLocateTest] SET  DISABLE_BROKER 
GO
ALTER DATABASE [TableLocateTest] SET AUTO_UPDATE_STATISTICS_ASYNC OFF 
GO
ALTER DATABASE [TableLocateTest] SET DATE_CORRELATION_OPTIMIZATION OFF 
GO
ALTER DATABASE [TableLocateTest] SET PARAMETERIZATION SIMPLE 
GO
ALTER DATABASE [TableLocateTest] SET  READ_WRITE 
GO
ALTER DATABASE [TableLocateTest] SET RECOVERY FULL 
GO
ALTER DATABASE [TableLocateTest] SET  MULTI_USER 
GO
ALTER DATABASE [TableLocateTest] SET PAGE_VERIFY CHECKSUM  
GO
ALTER DATABASE [TableLocateTest] MODIFY FILEGROUP [FG1] DEFAULT
GO
USE [TableLocateTest]
GO
CREATE TABLE dbo.t1
    (
    a int NULL,
    b varchar(100) NULL
    )  ON FG1
GO
CREATE TABLE dbo.t2
    (
    a varchar(20) NULL,
    b varchar(50) NULL
    )  ON FG2
GO
insert into dbo.t1 (a, b) values(1, 'aaa'), (666, '666')
insert into dbo.t2 (a, b) values ('abcabc', 'abcdefghi'), ('defdef', 'qwerty')
go
checkpoint
go

Enjoy! :) If you want to submit an answer, just visit question page at beyondrelational.com and drop a comment. The submissions are allowed until 15th October 2011.

Posted in SQL Server | Tagged: , | 1 Comment »

What is an index – a theory (part 1/4 – dense and sparse indexes)

Posted by Szymon Wójcik on September 12, 2011

Since I had written a few posts about indexes before, I thought it would be good to shed some light on theory behind indexes in SQL Server. It would be divided into four parts:

  1. Dense and sparse indexes (this post).
  2. B-trees – searching for a value and operation efficiency.
  3. B-trees – inserting an item, operation cost and page splits.
  4. B-trees – deleting an item, operation cost and page merge.

By no means I am comfortable with saying that it’s a definitive source of knowledge about indexes in SQL Server. It’s a result of research conducted by me based on various sources. If you are interested to know more, try reading those books first:

  • Molina, Ullman Database Systems – The Complete Book, chapter 13
  • Knuth The Art of Computer Programming, chapter 6.2.4
  • Cormen, Leiserson, Rivest, Stein Introduction to Algorithms, chapter 18

Whenever I’m using online content by other authors, it’s always used by their consent.

Let’s start with a definition of an index. An index is a data structure that takes a property of records as input – typically the value of one or more fields – and finds the records with that property quickly [Molina, Ullman]. It is equivalent to say that an index allows us to find a record without having to scan all the possible records. The fields upon which an index is based are called search key, or just key.

To get an idea about indexes work, let’s consider following simplification. A sequential file is a data file for a relation which is sorted by primary key (we are assuming that all relations have a primary key). Note that it has some implications – all operations (search, insert, delete) have average complexity O(n), where n is a number of records. If a sequential file grows really big (has a lot of records), using it may become a problem.

If records are big, we could devise following method of indexing – create a file, called index file, in which we keep following pairs for all records from our sequential file – primary key value and pointer to the record in sequential file. Such index is called dense index, since it contains all values of primary key. Of course, it does not give much of a performance boost, but if the records in a table are big it’s easier (quicker) to scan such index than a table itself (especially if the index fits in memory). The data retrieval is done by one disk I/O since the index contains pointer to data in data file. The complexity measured in terms of number of records stays the same – it’s still O(n) on the average.

If a dense index were too big to give a benefit, we could take a shortcut and have a pointer not to every record, but, say, every fourth record. This is a sparse index, which trades a bit of speed of dense index for smaller size. In our case a sparse index is four times smaller than a dense index, so searching it would follow such procedure (we are searching for primary key value of x):

  1. Find the biggest value y in a sparse index, such as y <= x.
  2. Go to data file and check if primary key value in a record referenced by a pointer from sparse index is equal to x. If yes, we’re done.
  3. [Loop 4 times] Check the next record if primary key value is equal to x – if yes, search is complete.
  4. If primary key value of x is not found in the loop, it is not contained in the relation.

Note that step 1 in this procedure is also O(n) on the average, but with a lower coefficient hidden in a big O notation. It is also possible that a partial table scan from step 3 induces additional I/O, so having a sparse index “too sparse” might cause extra I/O.

In order to overcome this, we can create multi-level index structure – first level being either dense or sparse, whereas next levels have to be sparse indexes (dense index in levels 2+ does not provide any performance boost). The idea is to benefit from smaller record size in index, so the index page contains more records and it requires less I/O. Therefore the procedure above would be changed to (remember that 4 is our choice here, it can be a different number):

  1. Find the biggest value y in a sparse index, such as y <= x.
  2. Go to first-level index, record next 4 records and look up the primary key value of x.
  3. If primary key value of x is not found, it is not contained in the relation.

This kind of structure becomes very similar to B-tree with one difference – a tree can grow as deep as necessary by itself, but multi-level index has arbitrary depth. Owing to this, the pessimistic example in multi-level index still has O(n) complexity, but the tree lookup will be O(log n). With this comment, it is time to end part 1 of my index theory summary and return in the future with more information about B-trees.

Posted in SQL Server | Tagged: , , , , | Leave a Comment »

[Story] Terrible performance problem

Posted by Szymon Wójcik on September 7, 2011

Hello again and welcome to new series categorized informally as “DBA stories”. This is what it means – stories from real life depicting strange, bizarre, sometimes funny (sometimes not) but somehow interesting to a DBA. The choice of them is arbitrary – as long as I have my stories, I will try not to include stories posted by someone else. And remember, these things happened, so they might as well happen to you!

OK, so for the beginning a situation that made me stay up whole night once. I get a phone call from help desk agent around 11pm saying that one of mission-critical systems suffers from a poor performance. OK, I say, let me see – I log on and there it is, CPU load 100% with SQL Server barely responding due to load. The system uses transactional replication to maintain a warm-standby and I started thinking of invoking business continuity procedures. Then I saw that a 10 GB database has 40 GB of log which is growing by 1 GB every half an hour. So, it’s a database problem and I start digging.

Service restart for SQL Server didn’t help, of course – it didn’t remove the source of the problem. So I started looking for the source with Profiler and then I found it.

The clue was – one of the components received a patch with a new functionality, which – from my (DBA) view – took all records in one of the tables and in a loop updated all records in each iteration (for each record there was UPDATE issued without WHERE). There were like 5000 records in this table, so each execution of this procedure updated ~25 million records, and that’s not the end of it. This procedure was run once a minute! That’s why the log was growing so fast and replication could not simply keep it up as well.

Once the problem was found, one quick call to application support team with request to disable this functionality solved it immediately. How did it pass QA? you might ask. Well, there are two morals:

  1. Keep your QA environment as similar to production as possible.
  2. Don’t trust your developers – have it checked two or three times.

Posted in DBA stories, SQL Server | Tagged: , , , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 45 other followers