There’s an old joke that goes, “Doctor, doctor, it hurts when I do this.” While the person in question swings their arm over their head. The doctor’s response is, “Don’t do that.” Problem solved, right? Well, maybe not. Let’s take a quick example from life. I do crossfit (yeah, I’m one of those, pull up a chair I’ll tell you all about my clean & jerk progress… kidding). I’ve been experiencing pain in my shoulder. “It hurts when I do this.” But, I’m not going to stop. I’ve been working with my coach to identify where the pain is and what stretches and warm-ups I can do to get around it (assuming it’s not a real injury, and it isn’t). In short, we’re identifying the root cause and addressing the issue rather than just coming up with a “don’t do that” style solution.
As is usual with one of my rambling intros, I’m going to tie this back to our work as data professionals, hang on.
I recently ran into a set of requirements that a new DBA was told are “best practices” and that he had to comply with. I’m editing them in order to save anyone embarrassment (although, if you wrote these, you should be embarrassed). I think that each one represents a moment of “it hurts” followed by “don’t do that” which, as you’re going to see, is absolutely the wrong response. As crazy as they all are, it’s not the first time I’ve seen them. This information is clearly coming from some fever-filled internet-swamp where stupid mutates and grows. Let’s nip these in the bud.
1 – All queries against the production database must be executed by using RDP to connect to the production system. This is because connecting through SSMS will cause the server to crash
I couldn’t even begin to guess what happened that caused a system outage because of a query from SSMS, but I suspect it was some sort of crazy, extreme outlier of a query that probably didn’t run much better while RDPed into the server, but somehow avoided a network issue (or six). Who knows. Suffice to say, no. This is crazy. On multiple levels. The most important being, you’re giving people access to the operating system on production that really probably shouldn’t have it. And, you can only have one person connecting to production at any one time. Two teams ready for a deployment? Tough. Oh, and your production system, in addition to all the work it’s doing for SQL Server, it now has to handle all the work of SSMS and displaying the results of your queries. Was your server under stress when you started? It’s worse now. I really don’t know of anyone, including a very large number of remote DBAs, who don’t connect to SQL Server through SSMS running on their desktop.
2 – NOLOCK is required on every query. It makes things run faster.
The magic Turbo Button of SQL Server rears its ugly head, yet again. I suspect that if you went and looked at all the responses to questions on forums, blog posts, articles and presentations, there are more places defining exactly why this is bad than almost any topic except backups. One more time. Using NOLOCK on all your queries… and if you’re putting it everywhere anyway, why not just use READ_UNCOMMITTED on your connections, it’s easier… Anyway, Using NOLOCK on your queries results in dirty reads. Yes, everyone says, “Oh, that means if someone is change ‘dog’ to ‘cat’ I could see either ‘dog’ or ‘cat’. I don’t care.” Right. That’s true. It also means when pages split and rearrange, you may miss rows or have rows duplicated. All the businesses I know love it when data is missing from the queries they’re running. I’m sure you wouldn’t mind if your bank used NOLOCK and told you that you didn’t have any money in it, right?
3 – Transactions take the server offline. Never use them in queries.
Again, I’m having a hard time imagining what exactly occurred to make this seem like a good idea. You do know that every query has an implicit transaction? And your server is still online. Can you over use transactions, your transactions can be overly large, your transactions can run too long, or you can have too many transactions. But no, just using a transaction will not bring the server down.
4 – Do not JOIN tables because it leads to performance problems.
I’m actually used to a slight variation on this, “Never join more than three tables.” This is just a holdover from the old days when several things were going on. One, the old query optimizers just sucked, so choices on loops or merge or hash joins were really bad. We also didn’t really know how to index our tables properly back in the day (well, I didn’t), so it hurt performance a lot to join between tables. Our code was horrendous back in the day too, so it didn’t help. But, since at least SQL Server 2000, the optimizer is good, performance is fine. I saw an 86 table join (not one I wrote) run in under 200ms on SQL Server 2000 on ancient processors and small memory. Talk about completely busting that myth. This rule is crazy. Seriously crazy. You’re using a RELATIONAL storage engine and then saying that you can’t actually use relationships. If you don’t have or need relational data, certainly a very valid option, use a non-relational data storage engine. But if you have relational data and you’re using a relational engine, I strongly, very strongly, recommend you use the all the relational tools. That includes JOINs but also includes, primary keys, foreign keys, enforced referential integrity, and all the rest.
5 – Don’t use SSMS’s ‘Edit Top 200 Rows’ because it leaves locks on the table.
One I actually agree with. But not because of what they’re saying. I suspect someone must have found a situation where the UPDATE process from this window held a lock. But I don’t think this is a good way to work because I don’t like the Top 200 rows approach because, well, here’s a query from this menu choice in SSMS:
SELECT TOP (200) BusinessEntityID, CreditCardID, ModifiedDate FROM Sales.PersonCreditCard
Spot any issues? Apart from the missing semi-colon and missing column owners? Yeah a top without an ORDER BY. Which 200 rows are we getting? Yes. If you want to edit data, use an UPDATE command or a proper interface for the data, not this.
6 – Stored procedures should never be used. Prepared statements should never be used.
Plan cache should never be used. Oh, I thought that was the next statement. Look, we don’t have to get into the stored procedures are a blessing from Odin vs. stored procedures are a frost giant trick argument. We can say that using parameterized queries (stored procedures or prepared statements) leads to the reuse of plans for cache. Whereas, writing nothing but ad hoc queries results in massive amounts of queries piled into the plan cache, running through the full optimization process, chewing up your memory, your CPU time, and then you never use them ever again. Can you think of a more efficient mechanism for hurting your performance than taking away these tools? Me neither. Parameterize your queries, somehow. Every ORM tool I’ve worked with or read about can do this (and they can all use stored procedures too, just saying). There is no good excuse for not using parameterized queries for most of your code. And yes, there are places where ad hoc or dynamic T-SQL makes sense. But I’d argue that they’re the exception, not the rule.
Please, take the time to understand what went wrong when you hit a problem. Don’t just shy away from the thing associated with it. It’s like the Dark Ages when they executed inanimate objects after an accident. Surely we’re a little more sophisticated than that. Identify the problem, understand it, then apply appropriate solutions.
If you want to talk more about performance tuning, and you’re attending SQL Rally Nordic 2015 in Copenhagen, I’m doing a full day pre-conference seminar. Click here now to register.