I’m putting on a pre-conference seminar (also known as a pre-con) at the PASS Summit this year. I’m really honored to be able to present this and I’m pretty excited about it. So, if you want to talk query tuning, let’s get together at the Summit. For a few fun facts about the event, check out this Q&A over at PASS. To register for the event and my pre-con, go here now.
No, this is not about politics. It’s about your WHERE clause… and your JOIN criteria… and your HAVING clause. It’s about a canard that still makes the rounds occasionally. Please, help me put this statement to sleep for once and all:
A function on the left side of the equals sign can lead to performance problems
Well, you know, it’s sort of true. But then, a function on the right side of the equals sign can also lead to performance problems. In short, it’s not the placement of the function that causes issues, it’s the function that causes issues. Let’s take a look at a really simple example:
SELECT a.AddressID, a.AddressLine1, AddressLine2 FROM Person.Address AS a WHERE a.AddressLine1 = 'Downshire Way';
This simple query results in an equally simple execution plan:
Now, if we decide that we want to do something like look for all results that have ‘Way’ in them. It’s a different result set, but our index could be used for the new result set. The query will get modified to this:
SELECT a.AddressID, a.AddressLine1, AddressLine2 FROM Person.Address AS a WHERE RIGHT(a.AddressLine1, 3) = 'Way';
That’s a function on the left side of the equals sign. OMG!!1! The execution plan isn’t as nice any more:
So, if we change the query to this:
SELECT a.AddressID, a.AddressLine1, AddressLine2 FROM Person.Address AS a WHERE 'Way' = RIGHT(a.AddressLine1, 3);
Whew, dodged a bullet since we have the function on the right side of the equals sign. And so we get a better execution plan now:
Uhm, wait. That’s still a bad plan isn’t it? Why yes, yes it is. That’s because the problem isn’t which side of the equals sign we have a function, but the fact that we have a function on the column at all. There are a number of permutations we can get into around this. For example, what if, instead of putting the function on the column, we put it on the string, to only match to ‘Way’ instead of ‘Downshire Way.’ Well, that would fix the function issue, but then, we’d have to use a LIKE command and add a wild card to the beginning of the string, resulting in scans again. But the fundamental concern remains, we’re not talking about the left or right of the comparison operator, we’re talking about the existence of the function on the column.
Please, don’t repeat this one any more. OK? Thanks.
I’m excited to able to say that I’ve been given the opportunity to put on a full day workshop at SQL Connections on Friday, September 19th, 2014. The title is “Query Performance Tuning in SQL Server 2014″, but I assure you we’re going to cover things that are applicable if you’re still working on SQL Server 2005. We’ll start the day covering the different mechanisms you have to capture query metrics. We’ll go over dynamic management objects and extended events that are incredibly important to you in understanding which queries you need to tune. We’ll get an introduction into how the optimizer works and the importance that statistics, indexes and constraints play in helping the optimizer make the choices it makes. I promise, execution plans will be covered throughout the day, in great detail, because they are a fundamental part of how you’re going to understand the choices the optimizer made. Then, we’re going to go through common problems, how you can identify them, troubleshoot them, and solve them. We’ll get rid of a lot of myths and just easily fixed issues. Throughout the day we’ll be covering both older versions of SQL Server as well as SQL Server 2014. Then, to finish out the day, we’ll go over some of the new opportunities that are unique to SQL Server 2014, their use and their shortcomings.
In short, I’m trying to take you from not knowing which queries you need to tune, to identifying those problematic queries, understanding what the problems are and how to solve them. You’re going to know where to go to get started reading execution plans. You’re going to walk away with a love and fascination for extended events. You’re going to get tools and methods that you can apply to your own code, your own applications, your own servers. And, this all takes place after an amazing week of learning at the IT/Dev Connections event in Vegas. Please click here now to register.
I almost forgot to tell you about the Database Administration Virtual Chapter meeting next week, March 26th, 2014. I’ll be doing a talk about query tuning in Windows Azure SQL Database. It’s a talk I’ve given before (it was in the top 10 at the PASS Summit last year). Come find out why you’ll need to tune queries in WASD, the tools you get, and the glorious fact that you’ll actually be actively saving your business money by tuning queries! Click here now to register.
It really is so much easier to just throw hardware at badly performing databases. Just buy a bigger, faster server with more and faster disks and you can put off doing tuning work for another 6-9 months, easily. But, for most of us, sooner or later, our performance problems get so big or, we just don’t have any more money to spend, and we’re stuck. We have to tune the queries. And frankly, query tuning is a pain in the nether regions.
But, after you’ve tuned queries 20 or 30 times, you start to recognize the patterns and it gets easier (never easy, just not as hard). But, if you haven’t done it 20 or 30 times, what do you do? My suggestion, talk to someone who has done it 30 times (or even a couple of hundred times), like me for example.
I have an all day session on tuning queries. It goes from understanding how the optimizer works (which will automatically lead you to write better queries), to how to gather performance metrics (so you know where the pain points are located), to reading execution plans (you need to know what has gone wrong with the query) to various mechanisms for fixing the query. This information is applicable to systems from SQL Server 2005 to SQL Server 2014 (sorry everyone still on 2000, it’s time to upgrade). The session is based on the books I’ve written about query tuning and execution plans, plus years and years of doing lots of query tuning.
Right now I’ve got two events scheduled. Before SQL Saturday #286 in Louisville, KY, I’ll be putting on this precon. We’re limited to seating, so don’t wait. You can go here to register. Then we can get together the next day at the SQL Saturday event to get some more education from all the great speakers there. Next, before SQL Saturday #302 in Albany, NY (their first one, ever), I’ll be hosting this. You can register by clicking here. Don’t miss the early bird special. Again, the next day will be filled with learning at the SQL Saturday event.
I’m working on taking this to other locations and venues. If you’re interested, please get in touch. I’ll do what I can to come to you.
If you have a particularly thorny query, bring it along with an actual execution plan. If we have time at the end of the day, I’ll take a look and makes suggestions, live (uh, please, no sensitive patient data or anything like that).
Let’s get together and talk query tuning.
No, I don’t mean the use of sp_updatestats is not smart. It’s a fine, quick mechanism for getting statistics updated in your system. But the procedure itself is not smart. I keep seeing stuff like “sp_updatestats knows which statistics need to be updated” and similar statements.
Wanna know how I know? It’s tricky. Ready? I looked at the query. It’s there, in full, at the bottom of the article (2014 CTP2 version, just in case yours is slightly different, like, for example, no Hekaton logic). Let’s focus on just this bit:
if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0)))
The most interesting part is right at the front, @ind_rowmodctr <> 0. That value is loaded with the cursor and comes from sys.sysindexes and the rowmodctr column there. In short, we can know that the “smart” aspect of sp_updatestats is that it assumes if there are any modifications, then updating statistics is good to go. We can argue for hours over how exactly you determine whether or not statistics are sufficiently out of date to warrant an update, but I’d be willing to bet that the sophisticated answers are not going to include just finding everything that’s been touched.
Now, don’t get me wrong. I’m not implying, suggesting or stating that sp_updatestats shouldn’t be used. It should. It’s fine. Just be very clear about what it does and how it does it.
Just a reminder, I’m putting on an all day seminar on query tuning in Louisville on June 20th, 2014. Seats are going fast, so please sign up early.
USE [master] GO /****** Object: StoredProcedure [sys].[sp_updatestats] Script Date: 3/6/2014 8:09:58 PM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER procedure [sys].[sp_updatestats] @resample char(8)='NO' as declare @dbsid varbinary(85) select @dbsid = owner_sid from sys.databases where name = db_name() -- Check the user sysadmin if not is_srvrolemember('sysadmin') = 1 and suser_sid() <> @dbsid begin raiserror(15247,-1,-1) return (1) end -- cannot execute against R/O databases if DATABASEPROPERTYEX(db_name(), 'Updateability')=N'READ_ONLY' begin raiserror(15635,-1,-1,N'sp_updatestats') return (1) end if upper(@resample)<>'RESAMPLE' and upper(@resample)<>'NO' begin raiserror(14138, -1, -1, @resample) return (1) end -- required so it can update stats on ICC/IVs set ansi_warnings on set ansi_padding on set arithabort on set concat_null_yields_null on set numeric_roundabort off declare @exec_stmt nvarchar(4000) -- "UPDATE STATISTICS [sysname].[sysname] [sysname] WITH RESAMPLE NORECOMPUTE" declare @exec_stmt_head nvarchar(4000) -- "UPDATE STATISTICS [sysname].[sysname] " declare @options nvarchar(100) -- "RESAMPLE NORECOMPUTE" declare @index_names cursor declare @ind_name sysname declare @ind_id int declare @ind_rowmodctr int declare @updated_count int declare @skipped_count int declare @sch_id int declare @schema_name sysname declare @table_name sysname declare @table_id int declare @table_type char(2) declare @schema_table_name nvarchar(640) -- assuming sysname is 128 chars, 5x that, so it's > 128*4+4 declare @compatlvl tinyint -- Note that we looked up from sys.objects$ instead sys.objects since some internal tables are not visible in sys.objects declare ms_crs_tnames cursor local fast_forward read_only for select name, object_id, schema_id, type from sys.objects$ o where o.type = 'U' or o.type = 'IT' open ms_crs_tnames fetch next from ms_crs_tnames into @table_name, @table_id, @sch_id, @table_type -- determine compatibility level select @compatlvl = cmptlevel from sys.sysdatabases where name = db_name() while (@@fetch_status <> -1) -- fetch successful begin -- generate fully qualified quoted name select @schema_name = schema_name(@sch_id) select @schema_table_name = quotename(@schema_name, '[') +'.'+ quotename(rtrim(@table_name), '[') -- check for table with disabled clustered index if (1 = isnull((select is_disabled from sys.indexes where object_id = @table_id and index_id = 1), 0)) begin -- raiserror('Table ''%s'': cannot perform the operation on the table because its clustered index is disabled', -1, -1, @tablename) raiserror(15654, -1, -1, @schema_table_name) end else begin -- filter out local temp tables if ((@@fetch_status <> -2) and (substring(@table_name, 1, 1) <> '#')) begin -- reset counters for this table select @updated_count = 0 select @skipped_count = 0 -- print status message --raiserror('Updating %s', -1, -1, @schema_table_name) raiserror(15650, -1, -1, @schema_table_name) -- initial statement preparation: UPDATE STATISTICS [schema].[name] select @exec_stmt_head = 'UPDATE STATISTICS ' + @schema_table_name + ' ' -- using another cursor to iterate through -- indices and stats (user and auto-created) -- Hekaton indexes do not appear in sys.sysindexes so we need to use sys.stats instead -- Hekaton indexes do not support rowmodctr so we just return 1 which will force update stats -- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables if ((@table_type = 'U') and (1 = OBJECTPROPERTY(@table_id, 'TableIsMemoryOptimized'))) -- Hekaton tables begin set @index_names = cursor local fast_forward read_only for select name, stats_id, 1 as rowmodctr from sys.stats where object_id = @table_id and indexproperty(object_id, name, 'ishypothetical') = 0 order by stats_id end else begin set @index_names = cursor local fast_forward read_only for select name, indid, rowmodctr from sys.sysindexes where id = @table_id and indid > 0 and indexproperty(id, name, 'ishypothetical') = 0 and indexproperty(id, name, 'iscolumnstore') = 0 order by indid end open @index_names fetch @index_names into @ind_name, @ind_id, @ind_rowmodctr -- if there are no stats, skip update if @@fetch_status < 0 --raiserror(' %d indexes/statistics have been updated, %d did not require update.', -1, -1, @updated_count, @skipped_count) raiserror(15651, -1, -1, @updated_count, @skipped_count) else begin while @@fetch_status >= 0 begin -- create quoted index name declare @ind_name_quoted nvarchar(258) select @ind_name_quoted = quotename(@ind_name, '[') -- reset options select @options = '' declare @is_ver_current bit select @is_ver_current = stats_ver_current(@table_id, @ind_id) -- note that <> 0 should work against old and new rowmodctr logic (when it is always > 0) -- also, force a refresh if the stats blob version is not current if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0))) begin select @exec_stmt = @exec_stmt_head + @ind_name_quoted -- Add FULLSCAN for hekaton tables -- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables if ((@table_type = 'U') and (1 = OBJECTPROPERTY(@table_id, 'TableIsMemoryOptimized'))) -- Hekaton tables select @options = 'FULLSCAN' -- add resample if needed else if (upper(@resample)='RESAMPLE') select @options = 'RESAMPLE ' if (@compatlvl >= 90) -- put norecompute if local properties are set to AUTOSTATS = OFF -- note that ind name is unique within the object if ((select no_recompute from sys.stats where object_id = @table_id and name = @ind_name) = 1) begin if (len(@options) > 0) select @options = @options + ', NORECOMPUTE' else select @options = 'NORECOMPUTE' end if (len(@options) > 0) select @exec_stmt = @exec_stmt + ' WITH ' + @options --print @exec_stmt exec (@exec_stmt) --raiserror(' %s has been updated...', -1, -1, @ind_name_quoted) raiserror(15652, -1, -1, @ind_name_quoted) select @updated_count = @updated_count + 1 end else begin --raiserror(' %s, update is not necessary...', -1, -1, @ind_name_quoted) raiserror(15653, -1, -1, @ind_name_quoted) select @skipped_count = @skipped_count + 1 end fetch @index_names into @ind_name, @ind_id, @ind_rowmodctr end --raiserror(' %d index(es)/statistic(s) have been updated, %d did not require update/disabled.', -1, -1, @updated_count, @skipped_count) raiserror(15651, -1, -1, @updated_count, @skipped_count) end deallocate @index_names end end print ' ' fetch next from ms_crs_tnames into @table_name, @table_id, @sch_id, @table_type end raiserror(15005,-1,-1) deallocate ms_crs_tnames return(0) -- sp_updatestats
I spend quite a bit of time writing about query tuning on this blog. I’ve written (re-written and am actively re-writing) books on query tuning. But what I like most is talking about query tuning. I love giving sessions at various events on different aspects of query tuning, but, what I like the most is spending a whole day, trying to do a complete brain dump to get as much information out there as possible. Sound attractive? Then I’ve got a great deal for you. Come to Louisville on June 20th, 2014. We will talk query tuning at length. You have a specific question? Let’s get it answered. Then, the next day, we can all go to SQL Saturday 286 there in Louisville to get more learning and some serious networking. What’s not to like?
Let’s have some fun.
This Friday, November 1, 2013, I’m putting on an all day seminar on query tuning. It’s set up as a pre-conference event for SQL Saturday 255 in Dallas. It’s a 200 level course on understanding how the query optimizer works, the importance of statistics, constraints and indexes, how to read execution plans, and how to take all that knowledge and go to work on tuning your queries.
Here’s the fun. Sign up for the seminar, and bring a nasty query you’ve been trying to tune or a query you don’t understand or an execution plan that’s making you crazy. Depending on the time available near the end of the day, we’ll walk through a few of them. I’ve slightly restructured the seminar so I have some flexibility near the end to do this. It should be a blast. Don’t bring anything that contains sensitive data because I’m going to put it up on the big board in front of your peers. Also, don’t worry about blame. We all know Timmy wrote that query, not you.
There are only a couple of days left to sign up. If you are in the Dallas area and you want to learn query tuning, and maybe have a giggle along the way, please, click the link and register now.
I was presenting a session on how to read execution plans when I received a question: Do you have a specific example of how you can use the query hash to identify similar query plans. I do, but I couldn’t show it right then, so the person asking requested this blog post.
If you’re dealing with lots of application generated, dynamic or ad hoc T-SQL queries, then attempting to determine tuning opportunities, missing indexes, incorrect structures, etc., becomes much more difficult because you don’t have a single place to go to see what’s happening. Each ad hoc query looks different… or do they. Introduced in SQL Server 2008 and available in the standard Dynamic Management Objects (DMO), we have a mechanism to identify ad hoc queries that are similar in structure through the query hash.
Query hash values are available in the following DMOs: sys.dm_exec_requests and sys.dm_exec_query_stats. Those two cover pretty much everything you need, what’s executing right now, what has recently executed (well, what is still in cache that was recently executed, if a query isn’t in cache, you won’t see it). The query hash value itself is nothing other than the output from a hash mechanism. A hash is a formula that outputs a value based on input. For the same, or similar, input, you get the same value. There’s also a query_plan_hash value that’s a hash of the execution plan.
Let’s see this in action. Here is a query:
SELECT soh.AccountNumber , soh.DueDate , sod.OrderQty , p.Name FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID JOIN Production.Product AS p ON p.ProductID = sod.ProductID WHERE p.Name LIKE 'Flat%';
And if I modify it just a little, like you might with dynamically generated code:
SELECT soh.AccountNumber , soh.DueDate , sod.OrderQty , p.Name FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID JOIN Production.Product AS p ON p.ProductID = sod.ProductID WHERE p.Name LIKE 'HL%';
What I’ve got is essentially the same query. Yes, if I were to parameterize that WHERE clause by creating a stored procedure or a parameterized query it would be identical, but in this case it is not. In the strictest terms, those are two different strings. But, they both hash to the same value: 0x5A5A6D8B2DA72E25. But, here’s where it gets fun. The first query returns no rows at all, but the second returns 8,534. They have identical query hash values, but utterly different execution plans:
Now, how to use this? First, let’s say you’re looking at the second execution plan. You note the scan of the SalesOrderHeader clustered index and decide you might want to add an index here, but you’re unsure of how many other queries behave like this one. You first look at the properties of the SELECT operator. That actually contains the query hash value. You get that value and plug it into a query something like this:
SELECT deqs.query_hash , deqs.query_plan_hash , deqp.query_plan , dest.text FROM sys.dm_exec_query_stats AS deqs CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest WHERE deqs.query_hash = 0x5A5A6D8B2DA72E25;
This resulted in eight rows. Yeah, eight. Because I had run this same query different times in different ways, in combinations, so that the query hash, which is generated on each statement, is common, but there were all different sorts of plans and issues. But, a common thread running through them all, a scan on the clustered index as the most expensive operator. So, now that I can identify lots of different common access paths, all of which have a common problem, I can propose a solution that will help out in multiple locations.
The problem with this is, what if I modify the query like this:
SELECT soh.AccountNumber , soh.DueDate , sod.OrderQty , p.Name, p.Color FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID JOIN Production.Product AS p ON p.ProductID = sod.ProductID WHERE p.Name LIKE 'HL%';
Despite the fact that this query has an identical query plan to the one above, the hash value, because of the added p.Color column, is now 0xABB5AFDC5A4A6988. But, worth noting, the query_plan_hash values for both these queries are the same, so you can do the same search for common plans with different queries to identify potential tuning opportunities.
So, the ability to pull this information is not a magic bullet that will help you solve all your ad hoc and dynamic T-SQL issues. It’s just another tool in the tool box.
UPDATE: In the text I had incorrectly said this was introduced in 2005. It was 2008. I’ve updated it and added this little footnote. Sorry for any confusion.
First thing, there are no bad operators, just bad parents, uh, I mean query writers, or database designers, or ORM tools. Why do I say this? Because all the operators within a query execution plan serve a purpose. They are there to fulfill a task. Depending on where and when you see them, they’re doing exactly what you ask of them. The issues come up because you’re asking them to do a task that they may not be well suited for. This comes from inappropriate structures and inappropriate code. Lazy spools are not really bad (that was just link bait). In fact, depending on the query, what’s being done, how you’re retrieving data, what data is being retrieved, the lazy spool is actually awesome. But, it’s good to know what these things are doing because, like magic, the lazy spool comes with a price.
The spool operators within SQL Server are indications of the need to store data for reuse. That’s it. A spool is just gathering up information, putting it into a temp table so that it can use that data again. Key word and trick phrase in that last sentence, temp table. Again, depending on the needs of the query, temporary storage is a great solution. BUT, you are creating more stuff into tempdb, additional overhead to your disks, all that’s associated with this. So if your system is already under load in this area, depending on the query in question, you may need to adjust in order to eliminate the spool. The use of tempdb is the price you pay for the lazy spool.
A lazy spool is a non-blocking operator. That means that it doesn’t stop the flow of data while it gets loaded. Instead, it acts more or less as a pass through, letting data flow through it as needed by the downstream operations. That makes a lot sense when you consider that the lazy spool is loaded in a “lazy” fashion, or, as requested. The eager spool for example stops all operations until it completes loading it’s temporary object, loading in an “eager” fashion. Once data is loaded into the lazy spool, it can be used by other operators within the plan.
Let’s see this in action. I’m going to call a recursive CTE query from AdventureWorks2012
EXEC dbo.uspGetEmployeeManagers @BusinessEntityID = 9
This query results in the following execution plan (click on it to expand it).
Reading plans that involve spools, is where you really see the need to understand both the logical and physical processing order within an plan. The reason I say that is that if you just read this from right to left, the physical order, you’re likely to think that the spool is being loaded all the way over there on the right. Instead, it’s being loaded by Node ID 5, which I show below.
This shows how the logical processing order, left to right, like reading a book, has to be taken into account to deal with query plans. Now, in this case, the spool is how the optimizer is dealing with the recursive nature of the query. It loads data into a spool operator in a lazy fashion and that data is there when needed later within the operations of the query. In short, the lazy spool in this case is doing a good thing and helping you out.
So, why the lazy spool is bad? Well, it isn’t. It’s just not free.