Well, this is a kind of story that happens to every DBA once in a while. You must have experienced this if you had an application developed by a third-party, which has been poorly load-tested. The symptoms are: you test an application in QA environment and confirm functionalities on a small sample, but once you go live suddenly the performance suffers badly. You check server performance and SQL Server is eating 100% of CPU, HDDs are screaming, disk queues are growing longer and all the important queries time out.
As you may expect, the indexes are missing. If it’s the case, you will go down – but how fast depends on your load, number of connections and transactions, average users’ count and some other factors. The heavier the worklad, the quicker the crash. Luckily, you can defend from that.
First of all, if you don’t have any kind of warning system in-place, which will notify you when queries are starting to run longer, you can only respond. In the beginning you should prevent those cases, have your queries monitored with a SQL trace filtering queries high CPU, Reads and Writes. You may also try setting up a SQL Server performance alert or use extended events for that purpose, but a trace is the easiest, plus you get the data you may provide to Database Tuning Advisor as input and have a quick fix in a short time.
What if you don’t have query monitoring? You’re not completely doomed. Use the missing indexes DMVs to quickly find poorly peforming queries/tables, but be careful – it can only show 500 entries, so you may not get all the results you need. The best solution might be to use Profiler to record a workload and then optimize it, but since the performance already suffers you have to decide whether to run it or not. You may use new Activity Monitor and Recent Expensive Queries feature to probe for candidates but again, it is recommended to run DTA on the queries.
I had this situation a while ago when introducing a solution which monitors factory equipment and records it’s status. Initial tests were completed successfully, but shortly after going live with whole production floor a new functionality, which was highly expected by users, suddenly caused a system to stop, threatening to stop the business. We made a call to run Profiler in production environment and tuned the queries manually, which also involved some assembly reverse engineering with help of Red Gate’s Reflectorto identify some queries in order to optimize them later. The whole situation was fixed within 3 days, bu the second and third day were not looking so bright.