Category: SQL Server 2005

Mar 19 2014

Query Tuning Near You

It really is so much easier to just throw hardware at badly performing databases. Just buy a bigger, faster server with more and faster disks and you can put off doing tuning work for another 6-9 months, easily. But, for most of us, sooner or later, our performance problems get so big or, we just don’t have any more money to spend, and we’re stuck. We have to tune the queries. And frankly, query tuning is a pain in the nether regions.

But, after you’ve tuned queries 20 or 30 times, you start to recognize the patterns and it gets easier (never easy, just not as hard). But, if you haven’t done it 20 or 30 times, what do you do? My suggestion, talk to someone who has done it 30 times (or even a couple of hundred times), like me for example.

I have an all day session on tuning queries. It goes from understanding how the optimizer works (which will automatically lead you to write better queries), to how to gather performance metrics (so you know where the pain points are located), to reading execution plans (you need to know what has gone wrong with the query) to various mechanisms for fixing the query. This information is applicable to systems from SQL Server 2005 to SQL Server 2014 (sorry everyone still on 2000, it’s time to upgrade). The session is based on the books I’ve written about query tuning and execution plans, plus years and years of doing lots of query tuning.

Right now I’ve got two events scheduled. Before SQL Saturday #286 in Louisville, KY, I’ll be putting on this precon. We’re limited to seating, so don’t wait. You can go here to register. Then we can get together the next day at the SQL Saturday event to get some more education from all the great speakers there. Next, before SQL Saturday #302 in Albany, NY (their first one, ever), I’ll be hosting this. You can register by clicking here. Don’t miss the early bird special. Again, the next day will be filled with learning at the SQL Saturday event.

I’m working on taking this to other locations and venues. If you’re interested, please get in touch. I’ll do what I can to come to you.

If you have a particularly thorny query, bring it along with an actual execution plan. If we have time at the end of the day, I’ll take a look and makes suggestions, live (uh, please, no sensitive patient data or anything like that).

Let’s get together and talk query tuning.

Mar 18 2014

Finding Mistakes

Ever had that moment where you start getting errors from code that you’ve tested a million times? I had that one recently. I had this little bit of code for pulling information directly from query plans in cache:

WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
QueryPlans AS
(
SELECT RelOp.pln.value(N'@PhysicalOp', N'varchar(50)') AS OperatorName,
RelOp.pln.value(N'@NodeId',N'integer') AS NodeId,
RelOp.pln.value(N'@EstimateCPU', N'decimal(10,9)') AS CPUCost,
RelOp.pln.value(N'@EstimateIO', N'decimal(10,9)') AS IOCost,
dest.text
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY deqp.query_plan.nodes(N'//RelOp') RelOp (pln)
)

SELECT  qp.OperatorName,
        qp.NodeId,
        qp.CPUCost,
        qp.IOCost,
        qp.CPUCost + qp.IOCost AS EstimatedCost
FROM    QueryPlans AS qp
WHERE   qp.text = 'some query or other in cache'
ORDER BY EstimatedCost DESC;

I’ve probably run this… I don’t know how many times. But… I’m suddenly getting an error:

Msg 8114, Level 16, State 5, Line 7
Error converting data type nvarchar to numeric.

What the hell? There is no where this should be occurring. I dig through the query over and over and I can’t figure it out. Until… I finally notice that one of the operators in the plan has the CPUCost value stored as a float:

EstimateCPU=”1e-006″

Ummmm, since when? Since forever. I’ve just been lucky with my code. I’d just never hit a sufficiently small cost in the plans before. I hadn’t bothered to look for the actual data type in use in the schema definition, although it’s right there:

<xsd:attribute name=”EstimateCPU” type=”xsd:double” use=”required”/>

 

Long-Kiss-Goodnight-samuel-l-jackson-14034435-720-480

I never did one thing right in my life, you know that? Not one. That takes skill.

What did I do wrong? I was looking at the data output from the queries and in the plans rather than looking at the structure to know what to expect. It’s the kind of thing I would never do with T-SQL. I would always look to the table structure to know what data type a given column was. But in this case, with the XML, I looked at the data and made an assumption. And we all knows what that means. It makes an ass out of you and mption.

Or, in this case, me and mption.

Anyway, the corrected query is a pretty trivial change:

WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
QueryPlans AS
(
SELECT RelOp.pln.value(N'@PhysicalOp', N'varchar(50)') AS OperatorName,
RelOp.pln.value(N'@NodeId',N'integer') AS NodeId,
RelOp.pln.value(N'@EstimateCPU', N'float') AS CPUCost,
RelOp.pln.value(N'@EstimateIO', N'float') AS IOCost,
dest.text
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY deqp.query_plan.nodes(N'//RelOp') RelOp (pln)
)

SELECT  qp.OperatorName,
        qp.NodeId,
        qp.CPUCost,
        qp.IOCost,
        qp.CPUCost + qp.IOCost AS EstimatedCost
FROM    QueryPlans AS qp
WHERE   qp.text = 'SELECT * FROM HumanResources.vEmployee AS ve'
ORDER BY EstimatedCost DESC;

But I do feel bad if anyone has been using this the way I showed it. ‘Cause, yeah, I’ve demonstrated with this code in the past. Oops. However, great point. Especially when working with a public XML schema like this, it pays to go and look at that schema the same way you would a table in order to ensure that you’re using the correct data types.

Mar 11 2014

sp_updatestats Is Not Smart

No, I don’t mean the use of sp_updatestats is not smart. It’s a fine, quick mechanism for getting statistics updated in your system. But the procedure itself is not smart. I keep seeing stuff like “sp_updatestats knows which statistics need to be updated” and similar statements.

Nope.

Not true.

Wanna know how I know? It’s tricky. Ready? I looked at the query. It’s there, in full, at the bottom of the article (2014 CTP2 version, just in case yours is slightly different, like, for example, no Hekaton logic). Let’s focus on just this bit:

if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0)))

The most interesting part is right at the front, @ind_rowmodctr <> 0. That value is loaded with the cursor and comes from sys.sysindexes and the rowmodctr column there. In short, we can know that the “smart” aspect of sp_updatestats is that it assumes if there are any modifications, then updating statistics is good to go. We can argue for hours over how exactly you determine whether or not statistics are sufficiently out of date to warrant an update, but I’d be willing to bet that the sophisticated answers are not going to include just finding everything that’s been touched.

Now, don’t get me wrong. I’m not implying, suggesting or stating that sp_updatestats shouldn’t be used. It should. It’s fine. Just be very clear about what it does and how it does it.

 

Just a reminder, I’m putting on an all day seminar on query tuning in Louisville on June 20th, 2014. Seats are going fast, so please sign up early.

sp_updatestats:

USE [master]
GO
/****** Object:  StoredProcedure [sys].[sp_updatestats]    Script Date: 3/6/2014 8:09:58 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER procedure [sys].[sp_updatestats]
	@resample char(8)='NO'
as

	declare @dbsid varbinary(85)

	select @dbsid = owner_sid
		from sys.databases
		where name = db_name()

	-- Check the user sysadmin
	if not is_srvrolemember('sysadmin') = 1 and suser_sid() <> @dbsid
	begin
		raiserror(15247,-1,-1)
		return (1)
	end
	-- cannot execute against R/O databases  
	if DATABASEPROPERTYEX(db_name(), 'Updateability')=N'READ_ONLY'
	begin
		raiserror(15635,-1,-1,N'sp_updatestats')
		return (1)
	end

	if upper(@resample)<>'RESAMPLE' and upper(@resample)<>'NO'
	begin
		raiserror(14138, -1, -1, @resample)
		return (1)
	end

	-- required so it can update stats on ICC/IVs
	set ansi_warnings on
	set ansi_padding on
	set arithabort on
	set concat_null_yields_null on
	set numeric_roundabort off

	declare @exec_stmt nvarchar(4000)		-- "UPDATE STATISTICS [sysname].[sysname] [sysname] WITH RESAMPLE NORECOMPUTE"
	declare @exec_stmt_head nvarchar(4000)	-- "UPDATE STATISTICS [sysname].[sysname] "
	declare @options nvarchar(100)			-- "RESAMPLE NORECOMPUTE"

	declare @index_names cursor

	declare @ind_name sysname
	declare @ind_id int
	declare @ind_rowmodctr int
	declare @updated_count int
	declare @skipped_count int

	declare @sch_id int
	declare @schema_name sysname
	declare @table_name sysname
	declare @table_id int
	declare @table_type char(2)
	declare @schema_table_name nvarchar(640) -- assuming sysname is 128 chars, 5x that, so it's > 128*4+4

	declare @compatlvl tinyint

	-- Note that we looked up from sys.objects$ instead sys.objects since some internal tables are not visible in sys.objects
	declare ms_crs_tnames cursor local fast_forward read_only for
		select name, object_id, schema_id, type from sys.objects$ o
		where o.type = 'U' or o.type = 'IT'
	open ms_crs_tnames
	fetch next from ms_crs_tnames into @table_name, @table_id, @sch_id, @table_type

	-- determine compatibility level
	select @compatlvl = cmptlevel from sys.sysdatabases where name = db_name()

	while (@@fetch_status <> -1) -- fetch successful
	begin
		-- generate fully qualified quoted name
		select @schema_name = schema_name(@sch_id)
		select @schema_table_name = quotename(@schema_name, '[') +'.'+ quotename(rtrim(@table_name), '[')

		-- check for table with disabled clustered index
		if (1 = isnull((select is_disabled from sys.indexes where object_id = @table_id and index_id = 1), 0))
		begin
			-- raiserror('Table ''%s'': cannot perform the operation on the table because its clustered index is disabled', -1, -1, @tablename)
			raiserror(15654, -1, -1, @schema_table_name)
		end
		else
		begin
			-- filter out local temp tables
			if ((@@fetch_status <> -2) and (substring(@table_name, 1, 1) <> '#'))
			begin
				-- reset counters for this table
				select @updated_count = 0
				select @skipped_count = 0

				-- print status message
				--raiserror('Updating %s', -1, -1, @schema_table_name)
				raiserror(15650, -1, -1, @schema_table_name)

				-- initial statement preparation: UPDATE STATISTICS [schema].[name]
				select @exec_stmt_head = 'UPDATE STATISTICS ' + @schema_table_name + ' '

				-- using another cursor to iterate through
				-- indices and stats (user and auto-created)
				-- Hekaton indexes do not appear in sys.sysindexes so we need to use sys.stats instead
				-- Hekaton indexes do not support rowmodctr so we just return 1 which will force update stats
				-- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables
				if ((@table_type = 'U') and (1 = OBJECTPROPERTY(@table_id, 'TableIsMemoryOptimized')))	-- Hekaton tables
				begin
					set @index_names = cursor local fast_forward read_only for
						select name, stats_id, 1 as rowmodctr
						from sys.stats
						where object_id = @table_id and indexproperty(object_id, name, 'ishypothetical') = 0 
						order by stats_id
				end
				else
				begin
					set @index_names = cursor local fast_forward read_only for
						select name, indid, rowmodctr from sys.sysindexes
						where id = @table_id and indid > 0 and indexproperty(id, name, 'ishypothetical') = 0 
						and indexproperty(id, name, 'iscolumnstore') = 0
						order by indid
				end

				open @index_names
				fetch @index_names into @ind_name, @ind_id, @ind_rowmodctr

				-- if there are no stats, skip update
				if @@fetch_status < 0
					--raiserror('    %d indexes/statistics have been updated, %d did not require update.', -1, -1, @updated_count, @skipped_count)
					raiserror(15651, -1, -1, @updated_count, @skipped_count)
				else 
				begin
					while @@fetch_status >= 0
					begin
						-- create quoted index name
						declare @ind_name_quoted nvarchar(258)
						select @ind_name_quoted = quotename(@ind_name, '[')

						-- reset options
						select @options = ''

						declare @is_ver_current bit
						select @is_ver_current = stats_ver_current(@table_id, @ind_id)

						-- note that <> 0 should work against old and new rowmodctr logic (when it is always > 0)
						-- also, force a refresh if the stats blob version is not current
						if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0)))
						begin
							select @exec_stmt = @exec_stmt_head + @ind_name_quoted

							-- Add FULLSCAN for hekaton tables
							-- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables
							if ((@table_type = 'U') and (1 = OBJECTPROPERTY(@table_id, 'TableIsMemoryOptimized')))	-- Hekaton tables
								select @options = 'FULLSCAN'

							-- add resample if needed
							else if (upper(@resample)='RESAMPLE')
								select @options = 'RESAMPLE '

							if (@compatlvl >= 90)
								-- put norecompute if local properties are set to AUTOSTATS = OFF
								-- note that ind name is unique within the object
								if ((select no_recompute from sys.stats where object_id = @table_id and name = @ind_name) = 1)
								begin
									if (len(@options) > 0) select @options = @options + ', NORECOMPUTE'
									else select @options = 'NORECOMPUTE'
								end

							if (len(@options) > 0)
								select @exec_stmt = @exec_stmt + ' WITH ' + @options

							--print @exec_stmt
							exec (@exec_stmt)
							--raiserror('    %s has been updated...', -1, -1, @ind_name_quoted)
							raiserror(15652, -1, -1, @ind_name_quoted)
							select @updated_count = @updated_count + 1
						end
						else
						begin
							--raiserror('    %s, update is not necessary...', -1, -1, @ind_name_quoted)
							raiserror(15653, -1, -1, @ind_name_quoted)
							select @skipped_count = @skipped_count + 1
						end
						fetch @index_names into @ind_name, @ind_id, @ind_rowmodctr
					end
					--raiserror('    %d index(es)/statistic(s) have been updated, %d did not require update/disabled.', -1, -1, @updated_count, @skipped_count)
					raiserror(15651, -1, -1, @updated_count, @skipped_count)
				end
				deallocate @index_names
			end
		end
		print ' '
		fetch next from ms_crs_tnames into @table_name, @table_id, @sch_id, @table_type
	end
	raiserror(15005,-1,-1)
	deallocate ms_crs_tnames
	return(0) -- sp_updatestats
Feb 27 2014

Let’s Talk Query Tuning

I spend quite a bit of time writing about query tuning on this blog. I’ve written (re-written and am actively re-writing) books on query tuning. But what I like most is talking about query tuning. I love giving sessions at various events on different aspects of query tuning, but, what I like the most is spending a whole day, trying to do a complete brain dump to get as much information out there as possible. Sound attractive? Then I’ve got a great deal for you. Come to Louisville on June 20th, 2014. We will talk query tuning at length. You have a specific question? Let’s get it answered. Then, the next day, we can all go to SQL Saturday 286 there in Louisville to get more learning and some serious networking. What’s not to like?

Feb 18 2014

The CASE Statement and Performance

In case you don’t know, this query:

UPDATE dbo.Test1
SET C2 = 2
WHERE C1 LIKE '%33%';

Will run quite a bit slower than this query:

UPDATE dbo.Test1
SET C2 = 1
WHERE C1 LIKE '333%';

Or this one:

UPDATE dbo.Test1
SET C2 = 1
WHERE C1 = '333';

That’s because the second two queries have arguments in the filter criteria that allow SQL Server to use the statistics in an index to look for specific matching values and then use the balanced tree, B-Tree, of the index to retrieve specific rows. The argument in the first query requires a full scan against the index because there is no way to know what values might match or any path through the index to simply retrieve them.

But, what if we do this:

UPDATE dbo.test1
SET C2 =
CASE
WHEN C1 LIKE '19%' THEN 3
WHEN C1 LIKE '25%' THEN 2
WHEN C1 LIKE '37%' THEN 1
END;

We’re avoiding that nasty wild card search, right? So the optimizer should just be able to immediately find those values and retrieve them… Whoa! Hold up there pardner. Let’s set up a full test:

IF (SELECT  OBJECT_ID('Test1')
   ) IS NOT NULL 
    DROP TABLE dbo.Test1; 
GO
CREATE TABLE dbo.Test1 (C1 VARCHAR(50),C2 INT, C3 INT IDENTITY);

SELECT TOP 1500
        IDENTITY( INT,1,1 ) AS n
INTO    #Nums
FROM    Master.dbo.SysColumns sC1,
        Master.dbo.SysColumns sC2;
        
INSERT  INTO dbo.Test1
        (C1,C2)
        SELECT  n, n
        FROM    #Nums;
        
DROP TABLE #Nums;

CREATE CLUSTERED INDEX i1 ON dbo.Test1 (C1) ;


UPDATE dbo.test1
SET C2 =
CASE
WHEN C1 LIKE '%42%' THEN 3
WHEN C1 LIKE '%24%' THEN 2
WHEN C1 LIKE '%36%' THEN 1
END

DBCC FREEPROCCACHE()

UPDATE dbo.test1
SET C2 =
CASE
WHEN C1 LIKE '19%' THEN 33
WHEN C1 LIKE '25%' THEN 222
WHEN C1 LIKE '37%' THEN 11
WHEN C1 LIKE '22%' THEN 5
END

I added the extra CASE evaluation in the second query in order to get a different query hash value.

Here are the execution plans from the two queries:

TwoPlans

They’re pretty identical. Well, except for me forcing a difference in the hash values, they’re identical except for the details in the Compute Scalar operator. So what’s going on? Shouldn’t that second query use the index to retrieve the values? After all, it avoided that nasty comparison operator, right? Well, yes, but… we introduced a function on the columns. What function you ask? The CASE statement itself.

This means you can’t use a CASE statement in this manner because it does result in bypassing the index and statistics in the same way as using functions against the columns do.

Jan 20 2014

Time for a Quick Rant

This is an actual quote from what we can only assume is a functional human being:

The database is very big so we stopped taking backup’s.

Eight lords a leaping are you kidding me? Seriously! Seriously? By the Great Gu and all the Valkyries in Valhalla, you stopped taking backups of your PRODUCTION database because it was “very big.” And I’ll put down Brobdingnagian stacks of cash that “very big” in this case is probably 200-500gb or at worst 1-2tb. People, assuming you have enough brain stem intact to regulate breathing, you must know, you must by all the sparkly vampires in Twighlight KNOW that you need to have backups. Right? I mean, nothing ever goes wrong on this shiny marble we call Dirt, does it? No one would EVER just run a DELETE statement in production without a WHERE clause, would they you hairy bottomed tree climbing mouth breather? And I’m sure that, if it happened, you could just blow magic unicorn powder at the server in order to get the company’s billing list back, right? Because without that backup, you’re relying on the undivided attention, and total positive intent, of Odin and Freya to ensure that you never, oh, I don’t know, lose power causing the rocker arm on the disk to bash down repeatedly on the platter like Thor’s Hammer on an Ice Giant’s head, with similar results for your database. Until you have so much data that EMC hosts your company’s holiday party, for free, there is enough disk space, somewhere, to take a backup of your database.

Now, get out there and get it done. Don’t make me travel to each and every one of your places of work with the lead-weighted hickory learning bat to lay some education up side your beanie holder. Please, just take a backup.

I really mean it this time. Backup your database

I really mean it this time. Backup your database

Jan 15 2014

Database in Source Control

Many years ago, I was working with a great DBA. Seriously, a very smart and capable guy. He told me, “We need to put the database into source control, just like app code.” And I just laughed. Not because I disagreed with him. I knew he was right, but I had tried, several times, to do just that. See, I’m not really a DBA. I’m a developer. I knew that code (and all the T-SQL that describes databases is code) needed to be versioned, sourced, tracked and audited. But great googly moogly, it was not an easy thing to do.

I first tried just exporting the entire database into a script and then occasionally checking that script into source control. Yay! Mission Accomplished… Well, I had a database in source control, yes, but I didn’t have any of the great stuff that went with it, most of all, a way to deploy from source control.

Next, I tried just storing the stuff that changed most, procedures. But, I had to store everything as an ALTER, or, I had to store it all as a DROP/CREATE and store the security settings and extended properties. I tried both. Neither satisfied and it was WAY too easy for someone else to modify a script the wrong way and bring the entire thing crashing down. And, not to mention the fact that any and all structural changes outside of stored procedures had to be built manually, or using a compare tool to generate them (but not the procs, cause we have those in source control, remember) by comparing prod & dev or qa & dev or something & something… Oh yeah, that was fun.

Man, it was painful back then. But now, there are several ways you can do this using Microsoft and/or 3rd party tools.

Why aren’t you?

Seriously, most of you aren’t. I’ve been going all over the country teaching a class on database deployments (next one is in Cleveland if you’re interested) and I know most people don’t put their databases into source control. Of course, I’m pretty sure most people don’t talk to their Dev teams if they can help it, and it does seem like most Dev teams seem to be on less than a perfectly chatty basis with their DBAs. is that the cause? The thing is, you are experiencing pain in terms of time spent, mistakes made, slower deployments, less frequent deployments, possibly even down time, all because you don’t do your deployments right. And deployments start from source control.

Developers have spent the last 30 years or so figuring out better and better ways to arrive at functional code in the hands of their customers (internal or external) as fast as possible, as accurately as possible. Over the same 30 years, DBAs have been figuring out how to better and better protect the information under our charge ensuring that it’s backed up, available, performing well, and always on (to use a phrase). My suggestion to you, data pro, talk to your developers. Figure out what they do and how they do it. Take advantage of their years and years of process improvement and apply what they’ve learned to your database development and deployment.

There’s a new concept growing out there. It’s fairly well established within the *nix communities, DevOps. It’s the realization that the world doesn’t begin and end with your database/server/application. Instead, your database/server/application is completely dependent on the database/server/application that it needs to run. Notice, no one is more important here. We’re talking about creating mechanisms for teams to deliver functionality to their clients (internal or external). And it all starts with the realization that there are parts of the process that some of us are better at than others. Developers know how to develop and deploy within teams. Let’s learn from them. And the start, well, that’s source control.

So, is your database under source control… for real. If not, get it there. The excuses I used to have are gone. That means the excuses you have now are probably gone too.

Fair warning, I may use the term DevOps more in the future.

Oct 29 2013

Query Tuning in Dallas

Let’s have some fun.

This Friday, November 1, 2013, I’m putting on an all day seminar on query tuning. It’s set up as a pre-conference event for SQL Saturday 255 in Dallas. It’s a 200 level course on understanding how the query optimizer works, the importance of statistics, constraints and indexes, how to read execution plans, and how to take all that knowledge and go to work on tuning your queries.

Here’s the fun. Sign up for the seminar, and bring a nasty query you’ve been trying to tune or a query you don’t understand or an execution plan that’s making you crazy. Depending on the time available near the end of the day, we’ll walk through a few of them. I’ve slightly restructured the seminar so I have some flexibility near the end to do this. It should be a blast. Don’t bring anything that contains sensitive data because I’m going to put it up on the big board in front of your peers. Also, don’t worry about blame. We all know Timmy wrote that query, not you.

There are only a couple of days left to sign up. If you are in the Dallas area and you want to learn query tuning, and maybe have a giggle along the way, please, click the link and register now.

Sep 18 2013

Finding Ad Hoc Queries with Query Hash

I was presenting a session on how to read execution plans when I received a question: Do you have a specific example of how you can use the query hash to identify similar query plans. I do, but I couldn’t show it right then, so the person asking requested this blog post.

If you’re dealing with lots of application generated, dynamic or ad hoc T-SQL queries, then attempting to determine tuning opportunities, missing indexes, incorrect structures, etc., becomes much more difficult because you don’t have a single place to go to see what’s happening. Each ad hoc query looks different… or do they. Introduced in SQL Server 2008 and available in the standard Dynamic Management Objects (DMO), we have a mechanism to identify ad hoc queries that are similar in structure through the query hash.

Query hash values are available in the following DMOs: sys.dm_exec_requests and sys.dm_exec_query_stats. Those two cover pretty much everything you need, what’s executing right now, what has recently executed (well, what is still in cache that was recently executed, if a query isn’t in cache, you won’t see it). The query hash value itself is nothing other than the output from a hash mechanism. A hash is a formula that outputs a value based on input. For the same, or similar, input, you get the same value. There’s also a query_plan_hash value that’s a hash of the execution plan.

Let’s see this in action. Here is a query:

SELECT  soh.AccountNumber ,
        soh.DueDate ,
        sod.OrderQty ,
        p.Name
FROM    Sales.SalesOrderHeader AS soh
        JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID
        JOIN Production.Product AS p ON p.ProductID = sod.ProductID
WHERE   p.Name LIKE 'Flat%';

And if I modify it just a little, like you might with dynamically generated code:

SELECT  soh.AccountNumber ,
        soh.DueDate ,
        sod.OrderQty ,
        p.Name
FROM    Sales.SalesOrderHeader AS soh
        JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID
        JOIN Production.Product AS p ON p.ProductID = sod.ProductID
WHERE   p.Name LIKE 'HL%';

What I’ve got is essentially the same query. Yes, if I were to parameterize that WHERE clause by creating a stored procedure or a parameterized query it would be identical, but in this case it is not. In the strictest terms, those are two different strings. But, they both hash to the same value: 0x5A5A6D8B2DA72E25. But, here’s where it gets fun. The first query returns no rows at all, but the second returns 8,534. They have identical query hash values, but utterly different execution plans:

HasPlans

Now, how to use this? First, let’s say you’re looking at the second execution plan. You note the scan of the SalesOrderHeader clustered index and decide you might want to add an index here, but you’re unsure of how many other queries behave like this one. You first look at the properties of the SELECT operator. That actually contains the query hash value. You get that value and plug it into a query something like this:

SELECT  deqs.query_hash ,
        deqs.query_plan_hash ,
        deqp.query_plan ,
        dest.text
FROM    sys.dm_exec_query_stats AS deqs
        CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
        CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE   deqs.query_hash = 0x5A5A6D8B2DA72E25;

This resulted in eight rows. Yeah, eight. Because I had run this same query different times in different ways, in combinations, so that the query hash, which is generated on each statement, is common, but there were all different sorts of plans and issues. But, a common thread running through them all, a scan on the clustered index as the most expensive operator. So, now that I can identify lots of different common access paths, all of which have a common problem, I can propose a solution that will help out in multiple locations.

The problem with this is, what if I modify the query like this:

SELECT  soh.AccountNumber ,
        soh.DueDate ,
        sod.OrderQty ,
        p.Name,
		p.Color
FROM    Sales.SalesOrderHeader AS soh
        JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID
        JOIN Production.Product AS p ON p.ProductID = sod.ProductID
WHERE   p.Name LIKE 'HL%';

Despite the fact that this query has an identical query plan to the one above, the hash value, because of the added p.Color column, is now 0xABB5AFDC5A4A6988. But, worth noting, the query_plan_hash values for both these queries are the same, so you can do the same search for common plans with different queries to identify potential tuning opportunities.

So, the ability to pull this information is not a magic bullet that will help you solve all your ad hoc and dynamic T-SQL issues. It’s just another tool in the tool box.

For things like this and a whole lot more, let’s get together in November, 2013 at the all day pre-conference seminar at SQL Saturday Dallas.

UPDATE: In the text I had incorrectly said this was introduced in 2005. It was 2008. I’ve updated it and added this little footnote. Sorry for any confusion.

Sep 13 2013

Are Foreign Keys Better Than Indexes?

Apple and HammerWhen I first saw this question I thought to myself, “Self. Don’t you think that’s comparing apples to hammers? Yes, Self, I’m pretty sure it is. Good, I thought so too, self. Yeah, me too.” After rebooting because of the runaway iterations on that thought, I had another, “Well… hold on there self. Both types of objects, while pretty different, are taken into account by the query optimizer.” I then had to admit to myself that I had a point. So the question remains, are foreign keys better than indexes?

As my first self said, these are different objects and they fulfill different purposes within SQL Server. My second self wants to point out that when you’re dealing with functional objects within SQL Server, it’s a bad habit to start to equate one to the other, especially when they have such radically different functions. I, myself, know that an index is meant to be a mechanism to speed the retrieval of data. While a foreign key is meant to constrain data in a relational fashion between two tables enforcing that the data in one table can only contain data from the primary key (or a unique constraint) from the other table. But, my other self points out, indexes also work as constraints. As mentioned, you have a primary key, which is realized as an index on the table (clustered by default, but they don’t have to be). And, in addition to the primary key, you can have a unique constraint, also realized as an index on the table. So these constructs are similar to foreign keys. But, they’re very different. A foreign key simply validates, prior to allowing the data to get added to the table in question, that the values being added (or modified) match values in the key. That’s it. Indexes, unique or not, are a secondary storage mechanism within SQL Server. Clustered indexes actually are the table. They have the structure of an index, the balanced tree, or b-tree AND, at the leaf level, they have all the data that defines the table. Non-clustered indexes are sort of the same. They have a b-tree, just the same, and they have leaf levels that might have columns that are INCLUDEd in their storage.

Within the optimizer, assuming your foreign key is enforced, meaning it is created WITH CHECK or a check is run after it is created, then the optimizer can take advantage of the knowledge that data within the foreign key must match data within the parent table. This allows for better optimization of joins, even elimination of joins (see an example of this here). But, that’s not the same as what the optimizer does with indexes. Again, this is where the data is, so the optimizer is going to make a great deal of use of indexes in retrieving data in an optimal fashion, simply because that’s where the data is located (or, can be found if we’re talking key lookups and stuff like that).

So, while comparing apples & hammers, uh, I mean, foreign keys and indexes, you do hit the concept of constraints for both, the mechanisms associated with indexes are very different from those associated with the foreign key because they have storage while the foreign key does not.