I spend quite a bit of time writing about query tuning on this blog. I’ve written (re-written and am actively re-writing) books on query tuning. But what I like most is talking about query tuning. I love giving sessions at various events on different aspects of query tuning, but, what I like the most is spending a whole day, trying to do a complete brain dump to get as much information out there as possible. Sound attractive? Then I’ve got a great deal for you. Come to Louisville on June 20th, 2014. We will talk query tuning at length. You have a specific question? Let’s get it answered. Then, the next day, we can all go to SQL Saturday 286 there in Louisville to get more learning and some serious networking. What’s not to like?
Category: SQL Server 2012
I am terribly jazzed to be involved with this amazing event, SQL Intersection. It’s featuring some truly amazing speakers presenting on important topics. It’s being held here on the East Coast, right near the Mouse, the Duck and Dog. This is one of those conferences you need to get to. Check out the lineup. That is some of the smartest, most capable people I know. I’m quite humbled to be on the list with them, so I’ll do my level best to deliver good content. Look at the sessions. While I don’t know precisely when SQL Server 2014 is coming out, I’m sure it’s real soon, so this will be a great place to get a leg-up on understanding what this new set of technology offers, or just learn more about SQL Server in general, Azure, SSRS and SSIS.
Click here now to register for this special event.
In case you don’t know, this query:
UPDATE dbo.Test1 SET C2 = 2 WHERE C1 LIKE '%33%';
Will run quite a bit slower than this query:
UPDATE dbo.Test1 SET C2 = 1 WHERE C1 LIKE '333%';
Or this one:
UPDATE dbo.Test1 SET C2 = 1 WHERE C1 = '333';
That’s because the second two queries have arguments in the filter criteria that allow SQL Server to use the statistics in an index to look for specific matching values and then use the balanced tree, B-Tree, of the index to retrieve specific rows. The argument in the first query requires a full scan against the index because there is no way to know what values might match or any path through the index to simply retrieve them.
But, what if we do this:
UPDATE dbo.test1 SET C2 = CASE WHEN C1 LIKE '19%' THEN 3 WHEN C1 LIKE '25%' THEN 2 WHEN C1 LIKE '37%' THEN 1 END;
We’re avoiding that nasty wild card search, right? So the optimizer should just be able to immediately find those values and retrieve them… Whoa! Hold up there pardner. Let’s set up a full test:
IF (SELECT OBJECT_ID('Test1') ) IS NOT NULL DROP TABLE dbo.Test1; GO CREATE TABLE dbo.Test1 (C1 VARCHAR(50),C2 INT, C3 INT IDENTITY); SELECT TOP 1500 IDENTITY( INT,1,1 ) AS n INTO #Nums FROM Master.dbo.SysColumns sC1, Master.dbo.SysColumns sC2; INSERT INTO dbo.Test1 (C1,C2) SELECT n, n FROM #Nums; DROP TABLE #Nums; CREATE CLUSTERED INDEX i1 ON dbo.Test1 (C1) ; UPDATE dbo.test1 SET C2 = CASE WHEN C1 LIKE '%42%' THEN 3 WHEN C1 LIKE '%24%' THEN 2 WHEN C1 LIKE '%36%' THEN 1 END DBCC FREEPROCCACHE() UPDATE dbo.test1 SET C2 = CASE WHEN C1 LIKE '19%' THEN 33 WHEN C1 LIKE '25%' THEN 222 WHEN C1 LIKE '37%' THEN 11 WHEN C1 LIKE '22%' THEN 5 END
I added the extra CASE evaluation in the second query in order to get a different query hash value.
Here are the execution plans from the two queries:
They’re pretty identical. Well, except for me forcing a difference in the hash values, they’re identical except for the details in the Compute Scalar operator. So what’s going on? Shouldn’t that second query use the index to retrieve the values? After all, it avoided that nasty comparison operator, right? Well, yes, but… we introduced a function on the columns. What function you ask? The CASE statement itself.
This means you can’t use a CASE statement in this manner because it does result in bypassing the index and statistics in the same way as using functions against the columns do.
Many years ago, I was working with a great DBA. Seriously, a very smart and capable guy. He told me, “We need to put the database into source control, just like app code.” And I just laughed. Not because I disagreed with him. I knew he was right, but I had tried, several times, to do just that. See, I’m not really a DBA. I’m a developer. I knew that code (and all the T-SQL that describes databases is code) needed to be versioned, sourced, tracked and audited. But great googly moogly, it was not an easy thing to do.
I first tried just exporting the entire database into a script and then occasionally checking that script into source control. Yay! Mission Accomplished… Well, I had a database in source control, yes, but I didn’t have any of the great stuff that went with it, most of all, a way to deploy from source control.
Next, I tried just storing the stuff that changed most, procedures. But, I had to store everything as an ALTER, or, I had to store it all as a DROP/CREATE and store the security settings and extended properties. I tried both. Neither satisfied and it was WAY too easy for someone else to modify a script the wrong way and bring the entire thing crashing down. And, not to mention the fact that any and all structural changes outside of stored procedures had to be built manually, or using a compare tool to generate them (but not the procs, cause we have those in source control, remember) by comparing prod & dev or qa & dev or something & something… Oh yeah, that was fun.
Man, it was painful back then. But now, there are several ways you can do this using Microsoft and/or 3rd party tools.
Why aren’t you?
Seriously, most of you aren’t. I’ve been going all over the country teaching a class on database deployments (next one is in Cleveland if you’re interested) and I know most people don’t put their databases into source control. Of course, I’m pretty sure most people don’t talk to their Dev teams if they can help it, and it does seem like most Dev teams seem to be on less than a perfectly chatty basis with their DBAs. is that the cause? The thing is, you are experiencing pain in terms of time spent, mistakes made, slower deployments, less frequent deployments, possibly even down time, all because you don’t do your deployments right. And deployments start from source control.
Developers have spent the last 30 years or so figuring out better and better ways to arrive at functional code in the hands of their customers (internal or external) as fast as possible, as accurately as possible. Over the same 30 years, DBAs have been figuring out how to better and better protect the information under our charge ensuring that it’s backed up, available, performing well, and always on (to use a phrase). My suggestion to you, data pro, talk to your developers. Figure out what they do and how they do it. Take advantage of their years and years of process improvement and apply what they’ve learned to your database development and deployment.
There’s a new concept growing out there. It’s fairly well established within the *nix communities, DevOps. It’s the realization that the world doesn’t begin and end with your database/server/application. Instead, your database/server/application is completely dependent on the database/server/application that it needs to run. Notice, no one is more important here. We’re talking about creating mechanisms for teams to deliver functionality to their clients (internal or external). And it all starts with the realization that there are parts of the process that some of us are better at than others. Developers know how to develop and deploy within teams. Let’s learn from them. And the start, well, that’s source control.
So, is your database under source control… for real. If not, get it there. The excuses I used to have are gone. That means the excuses you have now are probably gone too.
Fair warning, I may use the term DevOps more in the future.
Let’s have some fun.
This Friday, November 1, 2013, I’m putting on an all day seminar on query tuning. It’s set up as a pre-conference event for SQL Saturday 255 in Dallas. It’s a 200 level course on understanding how the query optimizer works, the importance of statistics, constraints and indexes, how to read execution plans, and how to take all that knowledge and go to work on tuning your queries.
Here’s the fun. Sign up for the seminar, and bring a nasty query you’ve been trying to tune or a query you don’t understand or an execution plan that’s making you crazy. Depending on the time available near the end of the day, we’ll walk through a few of them. I’ve slightly restructured the seminar so I have some flexibility near the end to do this. It should be a blast. Don’t bring anything that contains sensitive data because I’m going to put it up on the big board in front of your peers. Also, don’t worry about blame. We all know Timmy wrote that query, not you.
There are only a couple of days left to sign up. If you are in the Dallas area and you want to learn query tuning, and maybe have a giggle along the way, please, click the link and register now.
I was presenting a session on how to read execution plans when I received a question: Do you have a specific example of how you can use the query hash to identify similar query plans. I do, but I couldn’t show it right then, so the person asking requested this blog post.
If you’re dealing with lots of application generated, dynamic or ad hoc T-SQL queries, then attempting to determine tuning opportunities, missing indexes, incorrect structures, etc., becomes much more difficult because you don’t have a single place to go to see what’s happening. Each ad hoc query looks different… or do they. Introduced in SQL Server 2008 and available in the standard Dynamic Management Objects (DMO), we have a mechanism to identify ad hoc queries that are similar in structure through the query hash.
Query hash values are available in the following DMOs: sys.dm_exec_requests and sys.dm_exec_query_stats. Those two cover pretty much everything you need, what’s executing right now, what has recently executed (well, what is still in cache that was recently executed, if a query isn’t in cache, you won’t see it). The query hash value itself is nothing other than the output from a hash mechanism. A hash is a formula that outputs a value based on input. For the same, or similar, input, you get the same value. There’s also a query_plan_hash value that’s a hash of the execution plan.
Let’s see this in action. Here is a query:
SELECT soh.AccountNumber , soh.DueDate , sod.OrderQty , p.Name FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID JOIN Production.Product AS p ON p.ProductID = sod.ProductID WHERE p.Name LIKE 'Flat%';
And if I modify it just a little, like you might with dynamically generated code:
SELECT soh.AccountNumber , soh.DueDate , sod.OrderQty , p.Name FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID JOIN Production.Product AS p ON p.ProductID = sod.ProductID WHERE p.Name LIKE 'HL%';
What I’ve got is essentially the same query. Yes, if I were to parameterize that WHERE clause by creating a stored procedure or a parameterized query it would be identical, but in this case it is not. In the strictest terms, those are two different strings. But, they both hash to the same value: 0x5A5A6D8B2DA72E25. But, here’s where it gets fun. The first query returns no rows at all, but the second returns 8,534. They have identical query hash values, but utterly different execution plans:
Now, how to use this? First, let’s say you’re looking at the second execution plan. You note the scan of the SalesOrderHeader clustered index and decide you might want to add an index here, but you’re unsure of how many other queries behave like this one. You first look at the properties of the SELECT operator. That actually contains the query hash value. You get that value and plug it into a query something like this:
SELECT deqs.query_hash , deqs.query_plan_hash , deqp.query_plan , dest.text FROM sys.dm_exec_query_stats AS deqs CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest WHERE deqs.query_hash = 0x5A5A6D8B2DA72E25;
This resulted in eight rows. Yeah, eight. Because I had run this same query different times in different ways, in combinations, so that the query hash, which is generated on each statement, is common, but there were all different sorts of plans and issues. But, a common thread running through them all, a scan on the clustered index as the most expensive operator. So, now that I can identify lots of different common access paths, all of which have a common problem, I can propose a solution that will help out in multiple locations.
The problem with this is, what if I modify the query like this:
SELECT soh.AccountNumber , soh.DueDate , sod.OrderQty , p.Name, p.Color FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID JOIN Production.Product AS p ON p.ProductID = sod.ProductID WHERE p.Name LIKE 'HL%';
Despite the fact that this query has an identical query plan to the one above, the hash value, because of the added p.Color column, is now 0xABB5AFDC5A4A6988. But, worth noting, the query_plan_hash values for both these queries are the same, so you can do the same search for common plans with different queries to identify potential tuning opportunities.
So, the ability to pull this information is not a magic bullet that will help you solve all your ad hoc and dynamic T-SQL issues. It’s just another tool in the tool box.
UPDATE: In the text I had incorrectly said this was introduced in 2005. It was 2008. I’ve updated it and added this little footnote. Sorry for any confusion.
When I first saw this question I thought to myself, “Self. Don’t you think that’s comparing apples to hammers? Yes, Self, I’m pretty sure it is. Good, I thought so too, self. Yeah, me too.” After rebooting because of the runaway iterations on that thought, I had another, “Well… hold on there self. Both types of objects, while pretty different, are taken into account by the query optimizer.” I then had to admit to myself that I had a point. So the question remains, are foreign keys better than indexes?
As my first self said, these are different objects and they fulfill different purposes within SQL Server. My second self wants to point out that when you’re dealing with functional objects within SQL Server, it’s a bad habit to start to equate one to the other, especially when they have such radically different functions. I, myself, know that an index is meant to be a mechanism to speed the retrieval of data. While a foreign key is meant to constrain data in a relational fashion between two tables enforcing that the data in one table can only contain data from the primary key (or a unique constraint) from the other table. But, my other self points out, indexes also work as constraints. As mentioned, you have a primary key, which is realized as an index on the table (clustered by default, but they don’t have to be). And, in addition to the primary key, you can have a unique constraint, also realized as an index on the table. So these constructs are similar to foreign keys. But, they’re very different. A foreign key simply validates, prior to allowing the data to get added to the table in question, that the values being added (or modified) match values in the key. That’s it. Indexes, unique or not, are a secondary storage mechanism within SQL Server. Clustered indexes actually are the table. They have the structure of an index, the balanced tree, or b-tree AND, at the leaf level, they have all the data that defines the table. Non-clustered indexes are sort of the same. They have a b-tree, just the same, and they have leaf levels that might have columns that are INCLUDEd in their storage.
Within the optimizer, assuming your foreign key is enforced, meaning it is created WITH CHECK or a check is run after it is created, then the optimizer can take advantage of the knowledge that data within the foreign key must match data within the parent table. This allows for better optimization of joins, even elimination of joins (see an example of this here). But, that’s not the same as what the optimizer does with indexes. Again, this is where the data is, so the optimizer is going to make a great deal of use of indexes in retrieving data in an optimal fashion, simply because that’s where the data is located (or, can be found if we’re talking key lookups and stuff like that).
So, while comparing apples & hammers, uh, I mean, foreign keys and indexes, you do hit the concept of constraints for both, the mechanisms associated with indexes are very different from those associated with the foreign key because they have storage while the foreign key does not.
I’m honestly not crazy about dynamic T-SQL within stored procedures. There are just a few too many opportunities to mess it up with dire circumstances to your server and your data. However, I absolutely recognize that dynamic T-SQL may be needed, and, in some situations, the best way to solve a problem. If you must use dynamic T-SQL, there are ways that are much more efficient than others. The very best thing you can do if you need to build dynamic strings to execute T-SQL in your stored procedures is use sp_executesql.
The main reason I advocate for sp_executesql is because you can build out completely dynamic strings of T-SQL, but, you can still take advantage of parameters. Parameters help you avoid a chat with the parents of Bobby Tables and can get you much more plan reuse. However, it’s worth noting that, because these are parameters, you’re going to have to deal with the good, and the bad, of parameter sniffing. Let’s see it at work.
Here’s a stored proc that uses sp_executesql (for no good reason, but this is just an example):
CREATE PROCEDURE dbo.DynamicAddressByCity ( @City NVARCHAR(30) ) AS DECLARE @TSQL NVARCHAR(MAX) , @Params NVARCHAR(MAX) ; SET @TSQL = N'SELECT a.AddressID, a.AddressLine1, a.AddressLine2, a.City, sp.[Name] AS StateProvinceName, a.PostalCode FROM Person.Address AS a JOIN Person.StateProvince AS sp ON a.StateProvinceID = sp.StateProvinceID WHERE a.City = @City;' SET @Params = N'@City NVARCHAR(30)'; EXECUTE sp_executesql @TSQL, @Params, @City = @City;
If I execute this query:
EXEC dbo.DynamicAddressByCity @City = N'London';
Then I’ll get an execution plan that looks like this:
If I were to re-execute the query using a different parameter:
EXEC dbo.DynamicAddressByCity @City = N'Mendon';
I’ll end up with the same execution plan. If we look at the properties for the SELECT operator:
We can see that the plan was compiled using a parameter value of ‘London’ and executed using a parameter value of ’Mendon.’ I already hear people saying, “So what?” Well, let’s remove the query from cache. I’ll query the Dynamic Management Objects (DMO) to get the plan handle and then remove it from cache using FREEPROCCACHE with the plan handle. Now, we’ll re-execute the query but using the parameter value of ‘Mendon.’ Here is the resulting execution plan:
Yeah, different right. That’s because of the differences in the statistics between the two values. ‘London’ will return over four hundred rows while ‘Mendon’ will only return two. Those differences, accurately portrayed within the statistics for the column, result in different execution plans because the parameters were ‘sniffed’.
Parameter sniffing absolutely applies to sp_executesql. While you should be using those parameters, don’t lose sight of the fact that this could lead to bad parameter sniffing.
For lots more on query tuning, if you’re in the Dallas area, I’ll be putting on an all-day pre-conference seminar on Friday, November 1, 2013, before SQL Saturday 255. You can go here to sign up for the event. Hurry, the early bird offer is going to expire soon.
Let’s be honest. Database administration is not all that tough. Set up your backups. Test them. Get consistency checks on line. Some maintenance routines for statistics and maybe for fragmentation are also helpful. Set up security. In most of the important ways, you’re done. Sit back, monitor the whole thing and collect your paycheck.
Yeah, well, it’s a nice dream.
The reality is that you’re dealing with constant change that throw this simple set of maintenance monitoring tasks into the garbage. No, you’ve got new databases under development. Sometimes by development teams that are absolute rocket scientists and you sit at their feet learning how they did some really cool piece of code. And sometimes by development teams that more resemble crazed monkeys throwing poo at the walls to see what sticks. There are upgrades to the third party tools your company uses to manage it’s finances/CRM/whatever. OS upgrades, SQL Server upgrades. You have changes to your data which introduces new performance problems. Parameter sniffing. What do you mean I have to put a WHERE clause on the DELETE statement? On, and on, and on. I’m not even scratching the surface for all the things that are constantly shifting under your feet or going wrong. No, this job is hard.
You have the tools you need to do everything. There’s the Transact Structured Query Language (T-SQL) that lets you manipulate data and objects. You’ve got the SQL Server Management Studio (SSMS) a graphical user interface to manipulate server, the databases, the objects inside the database, security, scheduling, some monitoring, performance metrics and others. You’ve got a powerful scripting language, PowerShell, to automate a lot of your work. Everything you need is already in your hands except one, time.
That’s where Red Gate Software can help. We recently modified our Database Administrator Bundle. Some of our toolset was only available through either the entire software set in the Toolbelt or through the Database Developer Bundle. But, these tools are needed by DBAs. Yes, we’re still including vital tools that are absolutely at the foundation of what you do as a DBA; SQL Backup Pro to get your backups in hand, SQL Monitor to keep an eye on your servers and databases, and SQL Multi-Script which lets you run distributed scripts across multiple machines & multiple databases. But now you’ve got what were considered “developer” centric tools. Let’s face it, yes, these are tools used when developing databases, but, as a DBA, you write as much T-SQL as any developer, probably more, so having a copy of SQL Prompt is a must. You’re maintaining production databases, yes, but you’ve also got multiple test, development and other environments. So SQL Compare and SQL Data Compare are a major piece of a well-appointed toolset.
In short, Red Gate reoriented the DBA Bundle in ways that are going to provide a positive impact in helping you do the hard work of database administration.
Want to start a fight between a bunch of DBAs or database developers? Ask them where the comma should be placed between columns in a SELECT statement. It’s actually a little disturbing just how much emotional value people place in this. Almost as good, tell a database developer you don’t like their naming standard. Hoo boy!
The purpose of a naming standard, I think most of us can agree, is for clarity. We’re defining language so that we all mean the same thing when we say it, right? We want to communicate clearly, so we’re going to implement a naming standard. Fine. Sounds good. And then, you get this:
Umm… Wait. What?
And there are hundreds of these, all with the same first three letters, tbl. Oh, wait, I get it. It’s tbl for Table. Because… um, we need to know what tables in our database are actually tables?
Where’s the clarity?
What about the rest, Ddltbl. If you’ve been to any presentation where I’ve explained Ddltbl (pronounced diddletibble) you must not reveal the secret. I’ll delete your comments. But feel free to guess if you don’t know. No one has ever guessed correctly. Why? Because it’s not even remotely clear. It’s an abbreviation, but not one you’ll find in a dictionary. No, instead, you’ll have to look this up in a special list that matches weird abbreviations, some of them weirder than Ddltbl, with real words. Clarity IS NULL.
OK, what about using spr_ & spu_ for stored procedures? Spr_ means a read procedure and spu_ means update. Clear right? And it’s clearly expandable to insert & delete & merge and procedures that do a mix of things… Wait, we can’t use spm_ for Mix because we’re using that for Merge. I know we’ll use spx_ for procedures that do a mix of things. What’s that, we already used spx_ for XML queries? What idiot did that? Me? Oh, well, let’s see, we’ll use spq_ for procedures that do a mix of things… And clarity just went away again. Not to mention the fact that I want to insert data into the InvoiceDescription table and to get type ahead I have to type ‘spi_I’ before I even get close to being able to select the correct table from code completion. That’s five characters. I’m lazy doggone it! I don’t want to type that much.
Let me now stand up and admit, WAY back at the beginning of my career in data management, I created an entire system with tables that were prefixed ‘tbl’ because I came out of development that used, what was called at the time Hungarian Notation, which was 1-3 characters before a variable to describe that variable. It was a holdover from the days when we needed to account for every letter used because the amount of memory & storage in systems was very tiny. By the gods, I even named every column with a prefix denoting its data type. And, I came up with the spr_, spu_, spd_, spi_, spm_, spq_, spx_ naming convention too (to my utter shame, and I think we had one or two others tossed in there that I don’t remember). Yes, I’ve inflicted this craziness on others. That’s a huge part of why I know it needs to stop (and no, I was NOT responsible for Ddltbl, I fought against it tooth & nail).
We actually have a common language that we can use to communicate clearly what objects are within our databases and what they do. It’s the written and spoken language that we use to communicate to each other when we’re not creating databases. In my case, this is English. Why don’t we just say that the table is Invoice (or Invoices, I’m not about to quibble on that) and that the procedure for adding data to the Invoice table is InvoiceAdd or InvoiceInsert? Why don’t we just use our language skills. Anyone who has had young children has repeated the phrase “Use your words” more than any of us care to think about. Let’s do the same thing with our database objects. Use your words!
Oh, and you should put your comma after the column.