Late start into 2014

Long time no post. There’s a number of reasons for that, all of them “overridden” by my laziness. 馃檪 I had a strong resolution of keeping good progress with the blog, but reality proved otherwise. Anyway, I am coming back to blogging since there are some ideas started or good to be shared.

SQL Server

First of all, SQL Server 2014 is publicly available since April 2014, and it’s hot. I’m not willing to copy/paste official marketing statements, but In-Memory OLTP (former Hekaton) and buffer pool extensions seem very promising, especially now, when the hardware prices are declining and there is more capacity available (in terms of both hard drive and memory). I tried it at CTP2 stage and I’m going to return聽to RTM for more testing action聽– I still haven’t experienced it live in production.
I also have some drafts of XML-related posts – how to process XML in SQL Server and mostly how to extract data to XML. This topic was relevant to me at some point and I had problems finding good comparison with execution statistics. Since I’m revisiting this, I’ll wrap it up in a series of posts.

Oracle

With a change in my job, I’m getting to know Oracle. I’m responsible for deploying an application that is using Database Engine and is hosted on WebLogic Server (both of them 11g). This is a completely new experience for me, so it’s possible some posts related to those areas, for example transition from SQL Server to Oracle.

ALM

Earlier on this year, I challenged myself and passed three exams that are required for MCSD: Application Lifecycle Management. I am starting to think in terms of ALM more and more. Last year I made some experiments with TFS 2013 while it was still in Preview stage, focusing on Continuous Integration and Continuous Delivery. Some time passed – TFS 2013 not only went RTM but already got Update 2 – and it might be reasonable to come back to that as Microsoft is incrementally adding new releases to each update. Another point worth noting here is that Microsoft is giving a TFS to everyone for free at visualstudio.com, of course with limited functionality but even the free plan gives you unlimited projects for up to 5 users, 60 minutes of build and 15000 virtual user load minutes, so a great place to start.

Others

Phase change cooling of Celeron 430 on DFI DK P35 馃檪 -50 Celsius on CPU done!


I came back actively to overclocking – I liked pushing an envelope a lot since my beginning with computers. I got myself a single stage phase change cooling unit (which is basically a stripped refrigerator – see the picture to the right) and a load of CPUs and RAM dating between 2002 and 2010. There’s going to be a lot of fun with that and for those who are interested you can follow me on hwbot.

And I decided to finally change the boring Andreas theme to something more modern. 馃檪

Building a clustered test lab with Windows Server 2012 RC and SQL Server 2012 RTM Evaluation

I have already described what my test lab at home looks like, however I find it interesting enough to provide details how to set it up. This post will be first in series and will cover planning and preparing the environment. Next posts will be about:

  • installing Windows Server 2012 in a VHD and聽enabling Hyper-V role
  • preparing virtual machine template and setting up virtual machines
  • preparing environment for setting up failover cluster聽
  • installing SQL Server on 2-node cluster

I am using concepts shown in a blog post by Jose Barreto describing this setup with Windows Server 2008 R2 and SQL Server 2008 R2.聽We will use Windows Server 2012 and SQL Server 2012 but main idea is the same.聽To build this lab you require:

  • a PC which:
    • has a CPU that supports virtualization in hardware (for Intel you want a CPU supporting VT-x technology, for AMD – AMD-V) – most聽non-low-end聽CPUs do
    • has minimum of 4 GB RAM (8 GB or more recommended)
    • has 50 GB free HDD space (up to you)
  • some software – all available for free:
    • Windows Server 2012 RC聽or Hyper-V Server 2012 RC聽installation media (I will use “full” Windows Server as I didn’t do it using command-line/PowerShell in Hyper-V Server) – I don’t recommend using VHD provided by Microsoft since it expands from 8 to 40 GB during boot time and you’d need lots more disk space ;
    • SQL Server 2012 Evaluation聽– valid from 180 days, but it’s more than enough for start (you might try Developer Edition if you have one – you don’t have to worry about expiration then)
  • whole procedure takes about 8 hours (depends on your system speed, mainly I/O, so if you’re blessed with RAID 0 on SSD drives you might actually do it much faster) – be warned!

What we’re going to do is we will install fresh Windows Server 2012 operating system in a VHD (so as not to mess with existing setup – you may install on a physical disk if you don’t mind, performance-wise it’s negligible) – I assume that a main OS on the PC we’re using is Windows 7/Server 2008 R2聽or newer, because XP does not support native-boot from VHD (not sure about Vista/Server 2008, but does anyone use it anymore? :P). Within this newly installed Server 2012 we will enable Hyper-V role and set up 4 virtual machines using one base template and four differencing disks. Then we will configure those machines like this:

  1. First machine is a domain controller and聽a DNS server – clustering requires聽a domain and Active Directory requires DNS.
  2. Second machine is storage controller – in the post by Jose (mentioned above) separate download of iSCSI Target is required for Windows Server 2008 R2, however it’s incorporated in Server 2012 – we don’t have to install anything, only enable additional OS features and configure them.
  3. Third machine is an active node of SQL Server failover cluster.
  4. Fourth machine is a passive node of SQL Server failover cluster.
  5. You may add more nodes to the cluster or consider setting up AlwaysOn.

This makes our lab completely standalone, so it can operate even if there is no connection to the outside. If it’s not required to have it this way, for example – you have a domain controller – you may skip creation of first VM. If you have a NAS/SAN, you may use it and skip the second machine. You may also consider combining first two machines into one (if you’re low on resources and don’t want to spawn a horde of machines). AD services in case of a domain so huge as this one will not be very demanding, definitely we’re not going to have a workload on storage impact a domian controller significantly. On the other hand you may also try setting up additional machines to see how it works – I made some with System Center 2012 RC, but you can also build a load-balanced SharePoint farm with a clustered database engine.

When it comes to network – no additional configuration will be done as clustering in Windows Server 2008 removed the need for dedicated heartbeat, so as long as our VMs see each other it will be fine (you may try to include DHCP server in this setup but remember to keep it聽contained so that it does not try to provide addresses in every network it may find). Just remember – live system may not have exactly the same architecture and whatever works for our homemade cluster may not be enough when it’s deployed into production (but hey, that’s a lab we’re building!).

When it comes to resource assignment for VMs I use the following pattern (I have Phenom II X4 CPU and 16 GB in my rig):

  1. For domain controller – 2 CPUs and 768 – 1024 MB RAM.
  2. For storage server – 2 CPUs and 768 MB RAM.
  3. For SQL Servers – 4 CPUs and 2048 – 3072 MB RAM.
  4. Rest – whatever I feel like/need.

When I’m setting this on a laptop where I have memory limitations (4 GB RAM) I cut down first two VMs to 512 MB RAM and SQL Servers to 1024 – 1280 MB. This allows me to remain operable, however sacrificing a lot of performance. Therefore 8 GB RAM is recommended.

And this concludes our preparation. Next post will get us through host setup – we will install Windows Server 2012 in a VHD and make it bootable so stay sharp!

Object whitespace-naming madness

Have you ever tried creating objects and giving them strange names? I made some experiment at one of lab servers, main focus was space (you know, the long key at the bottom of the keyboard :D).

So, without further ado – the first object to be tested is a database. Run:

create database [ ]

and you get your favourite Command(s) completed successfully response. A quick look into Object Explorer shows that a space-named database was indeed created. Let鈥檚 script it and see what a monster we just spawned:

USE [master]   
GO  /****** Object:  Database [ ]    Script Date: 02/15/2012 22:21:54 ******/   
CREATE DATABASE [ ] ON  PRIMARY     
( NAME = N'', FILENAME = N'C:\SqlData\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\ .mdf' , SIZE = 1280KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )   
 LOG ON     
( NAME = N' _log', FILENAME = N'C:\SqlData\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\ _log.LDF' , SIZE = 504KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)    
GO  ALTER DATABASE [ ] SET COMPATIBILITY_LEVEL = 100   
GO  IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))   

begin    
EXEC [ ].[dbo].[sp_fulltext_database] @action = 'enable'    
end    
...all the ALTERs go here...

The filenames on the HDD are .mdf (space.mdf) and _log.ldf (space_log.ldf), which are perfectly valid filenames as far as NTFS is concerned, but clearly database data file has a name of an empty string (weird). It seems that in this case some kind of trimming has taken place, but not in the case of database log file name which starts with a space.

We can go forward and try creating another space-named database. Try

create database [  ]  

Notice slight difference in both scripts 鈥 first database is called one space, while second is two spaces. The result is at least weird (first result from SQL Server 2005, second from 2008 R2):

Msg 1801, Level 16, State 3, Line 1  
Database 鈥  鈥 already exists. 

or

Msg 1801, Level 16, State 3, Line 1   
Database '  ' already exists. Choose a different database name.    

Funny, isn鈥檛 it? It seems as if SQL Server trims the database name of whitespaces before actually attempting to create such an object. I鈥檓 not wondering about added value that space-named databases might give, but certainly something is not right. If I could create a database with a name of 鈥 鈥, why not 鈥 鈥? Maybe other whitespaces will work as well?

In fact, they do. Let鈥檚 try with a bell character (ASCII code 7). First obstacle is that SSMS editor does not let you type in bell character, so you have to work our way around with standard T-SQL CHAR function. But then operating system will not let you create a file with bell inside of a name, so you have to change the file names. In the end, the code is like this:

declare @DBName sysname   
declare @createQuery nvarchar(2000)  select @DBName = CHAR(7)   
select @createQuery = '    
CREATE DATABASE [' + @DBName + '] ON  PRIMARY     
( NAME = N''1'', FILENAME = N''C:\SqlData\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\1.mdf'' , SIZE = 2048KB , FILEGROWTH = 1024KB )    
 LOG ON     
( NAME = N''1_log'', FILENAME = N''C:\SqlData\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\1_log.ldf'' , SIZE = 1024KB , FILEGROWTH = 10%)    
'  execute sp_executesql @createQuery

I expected the bell character to act like one, so I opened SQLCMD and issued standard

SELECT * FROM sys.databases

But, to my surprise bell-character-named database is displayed as new line in the result set and my hopes for audible SQL are gone. 馃檪

Let’s go into our space-named database just to try to create a schema and a table inside with a column of type int. Try following piece of script:

use [ ]    
go    
create schema [ ]    
go    
create table [ ].[ ] ( [ ] int)    
go    

It really works, and the Object Explorer does prove it. You might actually test by issuing

INSERT INTO [ ].[ ].[ ].[ ] ([ ]) VALUES (1)

It would be even funnier if the column was of char type, then you could something like that (just don’t get mixed with delimiters and text escaping :P):

INSERT INTO [ ].[ ].[ ].[ ] ([ ]) VALUES (' ')

Wrapping it all up – since the post contained a lot of humour – I would rather expect whitespaces not to be permitted in object names at all, or at least that an object name has to contain one or more characters. For the sake of clarity I would even go further 鈥 to prevent crazy people from going rampant let object names be like variable names in most programming languages 鈥 contain only letters, digits and underscore, but cannot start with a digit.
On the other hand, you might say it’s just how flexible SQL Server is.

SQL Server Hardware Choices Made Easy by Glenn Berry – a quick review

I have come across Glenn Berry’s e-book by accident on twitter while ago by accident, downloaded it and then liked it very much. I have read it over one evening and first thing I did next day after arriving to work was forwarding it to all my coworkers.

What was the most interesting for me (as I share Glenn’s interest in hardware) was the detailed description of several interesting configurations using current hardware, as well as some future trends (like enterprise SSD drives, AMD Bulldozer or new Intel CPU). It just picks up where most of the books and guides leave you – for instance when most authors tell you to set up RAID-1 for logs and RAID-10 for database files, Glenn gives you an overview what actual models and what exact configuration may be applied and what is the cost of it. Furthermore, he shows what CPUs and in which configuration are preferred depending on your worklad (OLTP differs from OLAP). You will see that RAM is cheap and you don’t have to save on it. Last but not least, you will get a pocket-friendly suggestion on how to build a workstation for SQL Server.

If you have some second thoughts while picking up hardware for your new SQL Server, you have always bought 1-2-3-style machines (CPU/RAM/HDD count) or you’re simply interested in computer hardware and databases, this book is for you. Trust me, you won’t be disappointed.

[Story] Awful scripting

It happened to me few years ago, while I still was more developer than a DBA. I needed to create a database in development environment with identical structure to the one in production. Easy thing, you say, have a "Generate Scripts鈥 wizard do the job. Well, you鈥檒l see.

By the way, I would already omit one crucial detail 鈥 it was 3pm on Friday afternoon.

Since I don鈥檛 have enough rights to do it on my own (SOX forbid!), I go with a registered support request to admin team asking for a favour. The beginning was simple 鈥 Generate Scripts, all objects, new query window, 500 objects, a minute and we鈥檙e done. OK, let鈥檚 run, says he and clicks ‘鈥滶xecute鈥.

OMG, I thought seeing all those red messages in the query pane. Unable to login.

In about half an hour users start calling that they are unable to continue production and a factory is going to stop.

We call a senior admin with all the information we have and he says 鈥渞estore the master鈥. 鈥淲hat?!鈥 we ask. So in about an hour he drops by (he was already home), brings down the server, runs it in a single-user mode and restores master database from warm standby to primary production server. It takes him half an hour, factory doesn鈥檛 stop, the world is saved and we (me and the other guy who clicked Execute) are laughed at for a year. Morals?

  1. Be careful what you wish (script) for 鈥 it turned out that we scripted DROP and CREATE, for unknown reason. Double and triple check.
  2. If you have a script, but you鈥檙e unsure of it鈥檚 behaviour, don鈥檛 ever test it on production.
  3. If you have a job to do and it鈥檚 Friday afternoon, ask yourself if it can wait until Monday morning 鈥 you might have a weekend off.
  4. Have limited trust in the junior guys 鈥 let someone more experienced supervise them a bit.

What is an index – a theory (part 1/4 – dense and sparse indexes)

Since I had written a few posts about indexes before, I thought it would be good to shed some light on theory behind indexes in SQL Server. It would be divided into four parts:

  1. Dense and sparse indexes (this post).
  2. B-trees 鈥 searching for a value and operation efficiency.
  3. B-trees 鈥 inserting an item, operation cost and page splits.
  4. B-trees 鈥 deleting an item, operation cost and page merge.

By no means I am comfortable with saying that it鈥檚 a definitive source of knowledge about indexes in SQL Server. It鈥檚 a result of research conducted by me based on various sources. If you are interested to know more, try reading those books first:

  • Molina, Ullman Database Systems 鈥 The Complete Book, chapter 13
  • Knuth The Art of Computer Programming, chapter 6.2.4
  • Cormen, Leiserson, Rivest, Stein Introduction to Algorithms, chapter 18

Whenever I鈥檓 using online content by other authors, it鈥檚 always used by their consent.

Let鈥檚 start with a definition of an index. An index is a data structure that takes a property of records as input 鈥 typically the value of one or more fields 鈥 and finds the records with that property quickly [Molina, Ullman]. It is equivalent to say that an index allows us to find a record without having to scan all the possible records. The fields upon which an index is based are called search key, or just key.

To get an idea about indexes work, let鈥檚 consider following simplification. A sequential file is a data file for a relation which is sorted by primary key (we are assuming that all relations have a primary key). Note that it has some implications 鈥 all operations (search, insert, delete) have average complexity O(n), where n is a number of records. If a sequential file grows really big (has a lot of records), using it may become a problem.

If records are big, we could devise following method of indexing 鈥 create a file, called index file, in which we keep following pairs for all records from our sequential file 鈥 primary key value and pointer to the record in sequential file. Such index is called dense index, since it contains all values of primary key. Of course, it does not give much of a performance boost, but if the records in a table are big it鈥檚 easier (quicker) to scan such index than a table itself (especially if the index fits in memory). The data retrieval is done by one disk I/O since the index contains pointer to data in data file. The complexity measured in terms of number of records stays the same 鈥 it鈥檚 still O(n) on the average.

If a dense index were too big to give a benefit, we could take a shortcut and have a pointer not to every record, but, say, every fourth record. This is a sparse index, which trades a bit of speed of dense index for smaller size. In our case a sparse index is four times smaller than a dense index, so searching it would follow such procedure (we are searching for primary key value of x):

  1. Find the biggest value y in a sparse index, such as y <= x.
  2. Go to data file and check if primary key value in a record referenced by a pointer from sparse index is equal to x. If yes, we鈥檙e done.
  3. [Loop 4 times] Check the next record if primary key value is equal to x 鈥 if yes, search is complete.
  4. If primary key value of x is not found in the loop, it is not contained in the relation.

Note that step 1 in this procedure is also O(n) on the average, but with a lower coefficient hidden in a big O notation. It is also possible that a partial table scan from step 3 induces additional I/O, so having a sparse index 鈥渢oo sparse鈥 might cause extra I/O.

In order to overcome this, we can create multi-level index structure 鈥 first level being either dense or sparse, whereas next levels have to be sparse indexes (dense index in levels 2+ does not provide any performance boost). The idea is to benefit from smaller record size in index, so the index page contains more records and it requires less I/O. Therefore the procedure above would be changed to (remember that 4 is our choice here, it can be a different number):

  1. Find the biggest value y in a sparse index, such as y <= x.
  2. Go to first-level index, record next 4 records and look up the primary key value of x.
  3. If primary key value of x is not found, it is not contained in the relation.

This kind of structure becomes very similar to B-tree with one difference 鈥 a tree can grow as deep as necessary by itself, but multi-level index has arbitrary depth. Owing to this, the pessimistic example in multi-level index still has O(n) complexity, but the tree lookup will be O(log n). With this comment, it is time to end part 1 of my index theory summary and return in the future with more information about B-trees.

Index maintenance revisited

It鈥檚 almost a month since my post on index maintenance based on fragmentation, in which I attempted to show what can be done using simple script and some DMVs. There was some discussion on LinkedIn group SQLDBA about it and I would refer to it with few comments.

  1. The purpose of my script was purely to demonstrate the existence of sys.dm_db_index_physical_stats DMF, which in conjunction with other DMV yields very interesting results.
  2. The script presented by me was very simple, if not simplistic. It does not work well with partitioned indexes (my own findings), offline databases (as pointed out by Sten Westerback) and 鈥 what鈥檚 more important 鈥 it鈥檚 only criteria of operation is index fragmentation being completely unaware of size or other constraints. This does not make it a good candidate for running unmodified in production environments.
  3. As per another comment 鈥 indeed Microsoft provides very good example in Books Online regarding sys.dm_db_index_physical_stats, which deals with first two of points mentioned above.
  4. Finally, there is a fantastic maintenance tool developed by Ola Hallengren, which gives all kinds of options when it comes to database maintenance. It is successfully used by many DBAs all around world 鈥 if you need something to be working out-of-the-box, you might want to give it a try.

The reason why I鈥檓 writing this is that I don鈥檛 want to be blamed for taking a proof of concept (as my script was) and turning it into a primary maintenance solution. As usual 鈥 read, test, ask questions, adapt, share your findings, we all will benefit from that.