Yet another SQL Server DBA…

Just another SQL Server DBA blog

Archive for the ‘DBA stories’ Category

SQL Server upgrade from 2005 to 2008 R2 – more notes

Posted by Szymon Wójcik on May 30, 2012

I have already put down some notes about upgrading SQL Server from 2005 to 2008 R2, but since it’s been a few weeks I gained some useful experience. I’ll divide it into two parts – what works and what doesn’t, so it’s pretty much like a standard lessons learnt document. It’s all pretty fresh – I had two upgrades in May, both successful, but not without hassle (self-inflicted, of course).

What works:

  1. Copying installation source locally instead of installing from a network share – network disruptions have no impact on upgrade. I had my upgrade at the same time as LAN maintenance and it had a minimal impact on SQL Server upgrade.
  2. Having a local account with administrative rights – in case you can’t use built-in Administrator account you might consider creating a local account and adding it to local Administrators group as it will free you from network/login issues. Just remember to clean up afterwards.
  3. Upgrading from SQL Server 2005 databases in a compatibility level 90 – straightforward, never met a single one that had problems after setting compatibility level to 100.
  4. Migrating of Reporting Services between SQL Servers (both 2005) by means of backup\restore – yes, but in 2005 I had to install SSRS, initialize its database and overwrite it on restore. Never tried it in 2008+, but I presume it works exactly the same. When I tried to restore to a new database, and then reconfigure SSRS, it wasn’t able to initialize. However, there’s a trick with deleting encryption keys which might also work in this case – worth checking in free time.

What doesn’t work

  1. Upgrading a read-only database – renders it unusable. I did it (nothing to be proud of) and to fix it I had to put it in emergency mode, detach, reattach and set it back to read-only. Please note it’s what I did and worked for me  – checked with DBCC CHECKDB afterwards and the database was fine. Why emergency mode, you might ask – the database startup fails during recovery phase, because it requires to be upgraded as server version has changed while it’s read-only. So it leaves you with a 2008 R2 server using databases with internal version 661, but a read-only database remains with internal version 611 (because it’s read-only), so the server issues an error that data files are inaccessible and you can’t even detach it. The emergency mode is used here to suppress recovery and allow detaching – I wasn’t brave enough to try setting the database to read-write after emergency mode on – also worth checking in free time if it works and what happens to the database. Alternative – restore latest backup (since the database ia read-only, it should not change) to another SQL Server 2005 box, set it to read-write and copy to 2008 R2 instance. Lucky me as the whole database was read-only – if it were only a filegroup inside, that would be a bigger issue (see Paul Randal’s post at SQLMag.com about it).
  2. Upgrading SQL Server 2000 databases to 2008 R2 – some of them contain some deprecated features like old-style joins and it turns out that you can upgrade the server, but have to leave the database on compatibility level 80. I’ve seen this happen a few times, and the only solution is to review database structure and remove all the deprecated features. The thing is – you might not always have time or resources to do it, but it’s where SQL Server Upgrade Advisor comes in, the reports can help you track where the problems can be.
  3. This is me grumbling about a thing that is by design – a number of restarts required to upgrade your SQL Server, especially on Windows Server 2003. If you didn’t reboot since last patching session, restart. If you don’t have Windows Installer 4.5 (and I don’t since I’m starting from 2005), next restart. Upgrade SQL Server, next restart. After Service Pack, another restart. Now multiply it by a number of instances on the server that are upgraded, add some time for some preparations before, clean up after, plan for some contingency, and you can barely squeeze two instances in 6 hours. Luckily this isn’t one of the new HP ProLiant DL380 G7 servers that take 10 minutes to reboot, spending 9 minutes on power and thermal calibration. This is also an important point – when planning your maintenance window, consider all the activities. If the installation takes 30 minutes and updates another 15, both require a restart which takes 15 minutes, plus one upfront, then your downtime is not 45 minutes, but 90! It is crucial because if you miscalculate, you might not fit in the maintenance window and then you have a service down, not to mention that if something goes wrong, you need some time for rollback plan (which cannot fail – but imagine that tape drives are busy with your 70%-completed Exchange backup with still 2 TB to go and you have to wait till it’s done to restore 10 GB database).

Now, I might have another set comments pretty soon – a round of upgrades is waiting for me in June plus a retirement. If you have any comments or questions, feel free to share.

Posted in DBA stories, SQL Server | Tagged: , , , , , , | Leave a Comment »

SQL Server 2012 beta exams – my feelings

Posted by Szymon Wójcik on April 16, 2012

Earlier this month I had the opportunity to participate in beta tests of SQL Server 2012 certification exams. I have already described what exams are prepared and what the new certification path is in one of previous posts, so let me briefly describe how the exams loked like.

First of all, it’s a standard Prometric test. You come to testing center with two IDs (both with a signature, one of them needs to have a photo of you) half an hour before the exam, sign all the paperwork, leave all unnecessary stuff in a locker and wait till the test administrator calls you. I understand that the exam rules are pretty standard, but I love the part which says “no weapons are allowed at the Prometric test center”. Well, perhaps there are places in the world, where you have to fight your way to the test, fortunately Poland is not one of those.

I signed up for two beta exams (71-461 and 71-462) on one day, mostly due to time pressure. I am currently heavily loaded at work due to new system implementations and I didn’t have much time to do some real preparations. I had a bunch of VMs set up at my home PC lab, spent some time installing and configuring SQL Server 2012 but didn’t actually go deeper into it, for instance I had planned to establish AlwaysOn HA solution as a practice, but it ended up with reading Books Online a day before the exams. I had no idea what the certifications would be then apart from recertification requirement, so it was a blind shot made mostly for fun, but I guess I did OK – it was either 5 or none, if you were thinking certificate-wise.

There were two other guys waiting with me to take their tests and as usual I had bad luck – the computer assigned to me was out-of-date and received Prometric updates which had been applying for half an hour. It did not impact my test a lot, but I got a bit anxious at one moment and had to rebuild my focus. Then I left the room to catch my breath and prepare for second exam, but it turned out that the other computer I was supposed to use for testing hung and required assistance. All in all, I spent almost 6 hours at the testing center and finally took the wrong door while leaving.

The query exam (71-461) was an interesting one. I was expecting a lot of CRUD-like stuff. Instead, there were questions about database design and fundamental concepts such as keys and referential integrity, there was also a bit of DDL, all of course focused on features new to SQL Server 2012. If I were preparing for 70-461 exam, I would revisit:

  • new built-in functions, such as LEAD, LAG, TRY_PARSE
  • data types, paying special interest to date/time
  • DDL stuff including triggers vs stored procedures
  • locking and isolation levels and their impact on query results

Those were things which probably have impacted my result. What I didn’t like and strongly argued about it in the comments were questions related to cursors – the feature I don’t like and try to avoid as much as possible. With CTE being available since SQL Server 2005 I would expect cursors to lose their usefulness, but the test could make me think otherwise.

When it came to 70-462 – an administration exam – I felt absolutely confident about it. This is by all means the part of SQL Server that I am most exposed to (my development skills have gone down lately) and I find it most interesting. If you prepare for it, pay special attention to:

  • backup and restore strategies, with reference to recovery models
  • extended events – this is new diagnostic feature introduced in SQL Server 2008, so this is the first opportunity to test it
  • AlwaysOn – there is a big hype about it with SQL Server 2012 and you want to know everything about it
  • contained databases – what are they and how to convert standard database to a contained one

I had a few questions about database mirroring, which I found strange given that it is considered deprecated in next versions of SQL Server. I even argued about one of those, which was single-choice question and from what I understood two options were 100% correct.

Speaking of exam quality – after reading a few sources I expected a lot of bugs and holes. I was surprised (in a positive way) to see only one error and a number of inconsistencies. The results are going to be available at the moment when real exams go-live, which is about June 12. I am looking forward to results – it was a very interesting experience.

Posted in DBA stories | Tagged: , , , , , , , , | 5 Comments »

SQL Server 2005 SP3 to 2008 R2 upgrade – planning and issues

Posted by Szymon Wójcik on April 15, 2012

As part of my DBA activities I was tasked with performing feasibility study of upgrade of SQL Server 2005 to 2008 R2, which produced a nice roadmap I’m using now to perform step-by-step migrations. I thought it was going to be quite smooth, but sometimes – mostly due to inconsistencies in environment – issues happen, unfortunately.

First of all, if you happen to lead a project of such migration be sure to run SQL Server Upgrade Advisor to identify blocking issues early and solve them in advance. This may include old-style joins using *= or =* (if you have compatibility model 80 databases). Most SQL Server 2005 and later instances are compatible so it should be quite straightforward, but do yourself a favour, check it and be on a safe side.

Second, when you’re approaching a migration, clean the environment up as much as possible. Here’s my case – I had a server where databases had no owner, so I fixed it. But then it turned out that DEP is prevented installation of SQL Server from remote source. If you’re on Windows Server 2003, chances are that you need .NET 3.5 and Windows Installer 4.5 and it requires additional restart (that’s what happened to me). When I managed to run SQL Server Setup, final upgrade checks returned “SQL Server health check” failed. This was a bit tricky, because it required my account to get “Debug programs” privilege on local machine (careful! it needs logout), which took another 30 minutes.

Consider the restarts also take time. You might need 2 or 3 restarts, if going from 2005 SP3 to 2008 R2 SP1 and if you’re running servers like HP ProLiant DL380G7 which are quick, but take awful lot of time to boot, you may count 30 minutes for those restarts. 30 minutes in 2 hours maintenance window is 25% and it’s really a lot when a quarter of downtime you’re off and waiting for server to come up.

If you’re responsible for upgrade, have some time on side for such unexpected situations – if you think a maintenance window of 2 hours is sufficient, consider to have 4 hours booked. This will give you time to solve things if they appear without great time pressure and even if you happen to have a total failure that you need to do restore, you’re still on time.

Third, do all the backups needed. Take a backup of user databases and master prior to backup, take one after you complete upgrade and if you apply further SPs and CUs, backup after them as well. If you use a tape drive, do a full backup of your whole server after you complete migration. And don’t forget to verify the backup when you take it.

All those tips could be briefly wrapped up in a PM proverb “if you fail to plan, you plan to fail”. You can’t argue with that kind of wisdom.

Posted in DBA stories, SQL Server | Tagged: , , , , | 1 Comment »

[Story] Deployment with missing indexes and how to deal with that

Posted by Szymon Wójcik on October 14, 2011

Well, this is a kind of story that happens to every DBA once in a while. You must have experienced this if you had an application developed by a third-party, which has been poorly load-tested. The symptoms are: you test an application in QA environment and confirm functionalities on a small sample, but once you go live suddenly the performance suffers badly. You check server performance and SQL Server is eating 100% of CPU, HDDs are screaming, disk queues are growing longer and all the important queries time out.

As you may expect, the indexes are missing. If it’s the case, you will go down – but how fast depends on your load, number of connections and transactions, average users’ count and some other factors. The heavier the worklad, the quicker the crash. Luckily, you can defend from that.

First of all, if you don’t have any kind of warning system in-place, which will notify you when queries are starting to run longer, you can only respond. In the beginning you should prevent those cases, have your queries monitored with a SQL trace filtering queries high CPU, Reads and Writes. You may also try setting up a SQL Server performance alert or use extended events for that purpose, but a trace is the easiest, plus you get the data you may provide to Database Tuning Advisor as input and have a quick fix in a short time.

What if you don’t have query monitoring? You’re not completely doomed. Use the missing indexes DMVs to quickly find poorly peforming queries/tables, but be careful – it can only show 500 entries, so you may not get all the results you need. The best solution might be to use Profiler to record a workload and then optimize it, but since the performance already suffers you have to decide whether to run it or not. You may use new Activity Monitor and Recent Expensive Queries feature to probe for candidates but again, it is recommended to run DTA on the queries.

I had this situation a while ago when introducing a solution which monitors factory equipment and records it’s status. Initial tests were completed successfully, but shortly after going live with whole production floor a new functionality, which was highly expected by users, suddenly caused a system to stop, threatening to stop the business. We made a call to run Profiler in production environment and tuned the queries manually, which also involved some assembly reverse engineering with help of Red Gate’s Reflectorto identify some queries in order to optimize them later. The whole situation was fixed within 3 days, bu the second and third day were not looking so bright.

Posted in DBA stories, SQL Server | Tagged: , , | Leave a Comment »

[Story] Database collation change issues

Posted by Szymon Wójcik on October 9, 2011

So another story coming… This one distrubs chronological order, as I have already posted a story about replication, which ultimately was caused by the actions depicted here.

A little background – there is a database which is about 10 years old, which had a refresh just recently. During development it turned out that in order to keep reporting working a collation change is required, mostly due to data interchange via linked servers. It was easier than to work with collation on a object level, instead of trying to identify all the queries and modify them to align collation. As you might expect, a simple default database collation change was not possible due to check constraints being dependent on text columns. I had to drop and recreate those check constraints after collation change, but then it also appeared to me that while applying database_default collation on columns in some tables it couldn’t be completed simply due to indexes on those columns. So those indexes had to be dropped prior to table alteration just to be recreated after that. A lot of scripting was involved and whole operation took about 8 hours to complete.

One more recollection – since it was in a virtual machine environment, low disk space occured in a meantime, so if you come across such massive collation change, keep in mind that a lot of disk space is required for logging. In this case I required about 4 times of database size for logging, not to mention backups before and after.

Posted in DBA stories | Tagged: , | Leave a Comment »

[Story] Replication from remote site

Posted by Szymon Wójcik on October 5, 2011

Welcome to a next part of my DBA stories. This time I’m going to tell the story about replication problems I have experienced recently.

The task was pretty standard – set up a transactional replication of a database (size of few gigabytes) from a remote site to central office using our WAN and schedule it to synchronize at night. At first it seemed easy, there was one week deadline for it, the database had already been replicated in the past, so I thought it’s not a big deal.

First it turned out that about a half out of 300 tables in the database does not have a primary key. I had to contact development team to solve it – it appeared that it was caused by change of collation on a database which they had requested before! Luckily, some old copies existed so they could be scavenged for definitions of missing objects.

After fixing missing primary keys I proceeded with setting up publication. I chose everything to be published (there were no filtering requirements), generated the snapshot but then it couldn’t be applied at the Subscriber! I wasted two days (since I ways trying to get it done in line with assumed schedule) for trying to get a snapshot applied. After research I found that views were causing all the mess.

Two of the views were referencing linked servers which weren’t defined at the Subscriber.

One of the views was referencing a table in another database which existed at the Publisher, but not at the Subscriber.

One of the views was actually a disaster. I spent over an hour trying to get past it, until I saw it last. The script was like that:

CREATE VIEW viewName
AS

/* CREATE TABLE someTable
( column1,
column2,
...
columnN)
GO

*/

... actual view definition

The message in the Replication Monitor was “unable to find closing comment mark ‘*/’”. And after some it hit me – GO! Seeing all the issues and knowing that I already missed the deadline I decided to remove the views from the publication and recreate them at the Subscriber, if necessary.

So then the snapshot was OK, so I started synchronizing it during the day. But what happened was that network bandwidth was maxed, so literally al network traffic to remote site slowed to a crawl. The snapshot was applied in 14 hours, but now everything is back to normal. The correct schedule is applied, but I think I have some more grey hair due to this dam replication. It’s not my favourite high-availability technology.

Posted in DBA stories | Tagged: , , | 3 Comments »

[Story] Awful scripting

Posted by Szymon Wójcik on September 19, 2011

It happened to me few years ago, while I still was more developer than a DBA. I needed to create a database in development environment with identical structure to the one in production. Easy thing, you say, have a "Generate Scripts” wizard do the job. Well, you’ll see.

By the way, I would already omit one crucial detail – it was 3pm on Friday afternoon.

Since I don’t have enough rights to do it on my own (SOX forbid!), I go with a registered support request to admin team asking for a favour. The beginning was simple – Generate Scripts, all objects, new query window, 500 objects, a minute and we’re done. OK, let’s run, says he and clicks ‘”Execute”.

OMG, I thought seeing all those red messages in the query pane. Unable to login.

In about half an hour users start calling that they are unable to continue production and a factory is going to stop.

We call a senior admin with all the information we have and he says “restore the master”. “What?!” we ask. So in about an hour he drops by (he was already home), brings down the server, runs it in a single-user mode and restores master database from warm standby to primary production server. It takes him half an hour, factory doesn’t stop, the world is saved and we (me and the other guy who clicked Execute) are laughed at for a year. Morals?

  1. Be careful what you wish (script) for – it turned out that we scripted DROP and CREATE, for unknown reason. Double and triple check.
  2. If you have a script, but you’re unsure of it’s behaviour, don’t ever test it on production.
  3. If you have a job to do and it’s Friday afternoon, ask yourself if it can wait until Monday morning – you might have a weekend off.
  4. Have limited trust in the junior guys – let someone more experienced supervise them a bit.

Posted in DBA stories | Tagged: , , | Leave a Comment »

[Story] Terrible performance problem

Posted by Szymon Wójcik on September 7, 2011

Hello again and welcome to new series categorized informally as “DBA stories”. This is what it means – stories from real life depicting strange, bizarre, sometimes funny (sometimes not) but somehow interesting to a DBA. The choice of them is arbitrary – as long as I have my stories, I will try not to include stories posted by someone else. And remember, these things happened, so they might as well happen to you!

OK, so for the beginning a situation that made me stay up whole night once. I get a phone call from help desk agent around 11pm saying that one of mission-critical systems suffers from a poor performance. OK, I say, let me see – I log on and there it is, CPU load 100% with SQL Server barely responding due to load. The system uses transactional replication to maintain a warm-standby and I started thinking of invoking business continuity procedures. Then I saw that a 10 GB database has 40 GB of log which is growing by 1 GB every half an hour. So, it’s a database problem and I start digging.

Service restart for SQL Server didn’t help, of course – it didn’t remove the source of the problem. So I started looking for the source with Profiler and then I found it.

The clue was – one of the components received a patch with a new functionality, which – from my (DBA) view – took all records in one of the tables and in a loop updated all records in each iteration (for each record there was UPDATE issued without WHERE). There were like 5000 records in this table, so each execution of this procedure updated ~25 million records, and that’s not the end of it. This procedure was run once a minute! That’s why the log was growing so fast and replication could not simply keep it up as well.

Once the problem was found, one quick call to application support team with request to disable this functionality solved it immediately. How did it pass QA? you might ask. Well, there are two morals:

  1. Keep your QA environment as similar to production as possible.
  2. Don’t trust your developers – have it checked two or three times.

Posted in DBA stories, SQL Server | Tagged: , , , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 45 other followers