Category: SQL Server 2005

Sep 23 2014

SQL Server Query Performance Tuning

The latest update to my book, SQL Server Query Performance Tuning was released last week. This is the fourth edition of the book, and the third edition that I’ve been responsible for. At the urging of my editor, Jonathan Gennick, I have completely restructured the book for this release. The chapters have been broken up and rearranged so that they’re smaller, more easily consumed. Yes, I’ve worked with my technical editor, Joe Sack, to add lots of new information and to ensure that the existing information is more accurate and more useful. But, we’ve also added new chapters on topics that weren’t given enough attention in the previous versions of the book, such as parameter sniffing. It can certainly seem like these book releases are just some incremental changes on top of existing information, but not this time. This is a new book, with a new structure and new material (almost 80 pages worth), but, hopefully, with all the usefulness of the old book fundamentally intact. Please check it out.

If reading books isn’t your thing, or, you just want some personal interaction to assist your learning, I’ll be doing an all day, pre-conference seminar on query tuning at the PASS Summit this year, 2014. Go here to register.

Sep 03 2014

Left or Right?

No, this is not about politics. It’s about your WHERE clause… and your JOIN criteria… and your HAVING clause. It’s about a canard that still makes the rounds occasionally. Please, help me put this statement to sleep for once and all:

A function on the left side of the equals sign can lead to performance problems

Well, you know, it’s sort of true. But then, a function on the right side of the equals sign can also lead to performance problems. In short, it’s not the placement of the function that causes issues, it’s the function that causes issues. Let’s take a look at a really simple example:

SELECT  a.AddressID,
        a.AddressLine1,
        AddressLine2
FROM    Person.Address AS a
WHERE   a.AddressLine1 = 'Downshire Way';

This simple query results in an equally simple execution plan:

ExecSimple

Now, if we decide that we want to do something like look for all results that have ‘Way’ in them. It’s a different result set, but our index could be used for the new result set. The query will get modified to this:

SELECT  a.AddressID,
        a.AddressLine1,
        AddressLine2
FROM    Person.Address AS a
WHERE   RIGHT(a.AddressLine1, 3) = 'Way';

That’s a function on the left side of the equals sign. OMG!!1! The execution plan isn’t as nice any more:

ExecScan

So, if we change the query to this:

SELECT  a.AddressID,
        a.AddressLine1,
        AddressLine2
FROM    Person.Address AS a
WHERE   'Way' = RIGHT(a.AddressLine1, 3);

Whew, dodged a bullet since we have the function on the right side of the equals sign. And so we get a better execution plan now:

ExecScan

Uhm, wait. That’s still a bad plan isn’t it? Why yes, yes it is. That’s because the problem isn’t which side of the equals sign we have a function, but the fact that we have a function on the column at all. There are a number of permutations we can get into around this. For example, what if, instead of putting the function on the column, we put it on the string, to only match to ‘Way’ instead of ‘Downshire Way.’ Well, that would fix the function issue, but then, we’d have to use a LIKE command and add a wild card to the beginning of the string, resulting in scans again. But the fundamental concern remains, we’re not talking about the left or right of the comparison operator, we’re talking about the existence of the function on the column.

Please, don’t repeat this one any more. OK? Thanks.


 

For lots more on query tuning, let’s get together and talk. I have an all day seminar in two weeks at Connections in Las Vegas. Please go here to sign up.

Or, I’ll be doing a full day pre-conference seminar at the PASS Summit this year in Seattle. Last time I gave a similar talk it sold out, so please, if you’re interested, sign up now.

Aug 20 2014

The Red Gate Way…

SitCAs companies go, Red Gate is a little different. That is readily apparent in our tools and the philosophy behind them, ingeniously simple. But, we do a lot of other things too. There’s the Simple-Talk web site where we publish serious articles on all aspects of development and database administration across platforms and programming languages. There’s SQL Server Central, the single largest SQL Server community on the planet. There’s Ask SQL Server where you can get direct answers to your direct questions about SQL Server. If all that’s not enough, there are all the books, which we give away for free, on, again, all aspects of programming and database administration. But, we like to do more, so we also bring you training, the Red Gate way, at the SQL in the City events.

We’ve got two more SQL in the City events coming up soon. First, we’re back in London again on Friday, October 24, 2014. This event is one of my favorites, every year. We’re bringing in MVPs like Steve Jones, Ike Ellis, Brian Randell and others, all to teach you about SQL Server, but we’re doing it the Red Gate way. So please, register for this event. I’ll see you there and we can share a frothy beverage (it’s Red Gate).

Next, I’m thrilled to say that we’re going to be in Seattle on Monday, November 3, 2014. That’s right, just before the PASS Summit. If you wanted a reason to get out to Seattle early, here it is. We’re bringing a lot of the same crew from the London event over to Seattle. You’ll be able to experience what the London people did and more. This is SQL Server training done right, that is the Red Gate Way. Let’s get together and talk and share a frothy beverage in the States. It’s a free event, but there’s limited room, so please register now.

These are unique and popular events. We pull out all the stops to make them fun, special, educational, useful, helpful, doggone it, good. Please, come out, talk to me, talk to the Red Gate team, help influence the tools that you use every day, and learn about SQL Server.

Aug 14 2014

A Full Day of Query Tuning

I’m excited to able to say that I’ve been given the opportunity to put on a full day workshop at SQL Connections on Friday, September 19th, 2014. The title is “Query Performance Tuning in SQL Server 2014″, but I assure you we’re going to cover things that are applicable if you’re still working on SQL Server 2005. We’ll start the day covering the different mechanisms you have to capture query metrics. We’ll go over dynamic management objects and extended events that are incredibly important to you in understanding which queries you need to tune. We’ll get an introduction into how the optimizer works and the importance that statistics, indexes and constraints play in helping the optimizer make the choices it makes. I promise, execution plans will be covered throughout the day, in great detail, because they are a fundamental part of how you’re going to understand the choices the optimizer made. Then, we’re going to go through common problems, how you can identify them, troubleshoot them, and solve them. We’ll get rid of a lot of myths and just easily fixed issues. Throughout the day we’ll be covering both older versions of SQL Server as well as SQL Server 2014. Then, to finish out the day, we’ll go over some of the new opportunities that are unique to SQL Server 2014, their use and their shortcomings.

In short, I’m trying to take you from not knowing which queries you need to tune, to identifying those problematic queries, understanding what the problems are and how to solve them. You’re going to know where to go to get started reading execution plans. You’re going to walk away with a love and fascination for extended events. You’re going to get tools and methods that you can apply to your own code, your own applications, your own servers. And, this all takes place after an amazing week of learning at the IT/Dev Connections event in Vegas. Please click here now to register.

Jul 15 2014

Execution Plan Details

I wouldn’t say it’s common knowledge that you should look at execution plans when tuning queries, but it’s not exactly uncommon knowledge either. But, people tend to get focused on just looking at the graphical part of the plan and there’s just not enough information there. Let’s take a look at a query:

SELECT  pc.Name,
        ps.Name,
        v.Name,
        pr.ReviewerName,
        p.Name,
        v.ModifiedDate,
        p.Color
FROM    Production.Product AS p
        LEFT JOIN Production.ProductReview AS pr
        ON pr.ProductID = p.ProductID
        JOIN Production.ProductSubcategory AS ps
        ON p.ProductSubcategoryID = ps.ProductSubcategoryID
        JOIN Production.ProductCategory AS pc
        ON pc.ProductCategoryID = ps.ProductCategoryID
        JOIN Purchasing.ProductVendor AS pv
        JOIN Purchasing.Vendor AS v
        ON v.BusinessEntityID = pv.BusinessEntityID
        ON pv.ProductID = p.ProductID
WHERE   v.ModifiedDate = '2006-02-17 00:00:00.000'
AND p.Color LIKE 'Y%';

This generates an execution plan that looks like this:

PlanDetails

Neither v.ModifiedDate nor p.Color have indexes. Yet, we only see a single scan in this plan, on the BusinessEntity.Vendor table. Why? Well, we can’t tell from the GUI directly, so, you have to look to the tool tips or the properties. The tool tip in this case actually proves helpful, as does the properties:

PlanDetailsProperties

In short, the clustered index is used to seek out a number of rows and then a secondary predicate is placed on those few rows to further filter the results. You would never even guess at that using just the GUI alone. You have to look to the details of the execution plan to understand that. There are lots of other examples where you can only get the information from the properties because not everything is available in the tooltip. This can include things like Optimization Level, Reason For Early Termination, Scan Direction, and a bunch of other things. Just remember to drill down to the properties in order to better understand your execution plans.

Why do you need information like this? Are we going to tune this query? Maybe we could help it by adding an index t the Vendor table. But, at least for this query with this date range, it appears we don’t need an index on the Product table. But, that may change depending on the number of rows returned from the Vendor table. A different data set can result in an entirely different execution plan with entirely different performance characteristics.

Query tuning is tough. That’s why I’ve put together a one-day seminar to get you started on it. There’s still room in Albany, on July 25th. You can register here. I’ll also be teaching at SQL Connections in September in Las Vegas. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on just execution plans at SQL Server Days. Go here to register for this event. When I did a similar session at the PASS Summit two years ago, it sold out. I’ll be presenting at Summit this year in Seattle in November. Go here to register. But do it early or you may be put on a waiting list.

 

 

 

 

Mar 19 2014

Query Tuning Near You

It really is so much easier to just throw hardware at badly performing databases. Just buy a bigger, faster server with more and faster disks and you can put off doing tuning work for another 6-9 months, easily. But, for most of us, sooner or later, our performance problems get so big or, we just don’t have any more money to spend, and we’re stuck. We have to tune the queries. And frankly, query tuning is a pain in the nether regions.

But, after you’ve tuned queries 20 or 30 times, you start to recognize the patterns and it gets easier (never easy, just not as hard). But, if you haven’t done it 20 or 30 times, what do you do? My suggestion, talk to someone who has done it 30 times (or even a couple of hundred times), like me for example.

I have an all day session on tuning queries. It goes from understanding how the optimizer works (which will automatically lead you to write better queries), to how to gather performance metrics (so you know where the pain points are located), to reading execution plans (you need to know what has gone wrong with the query) to various mechanisms for fixing the query. This information is applicable to systems from SQL Server 2005 to SQL Server 2014 (sorry everyone still on 2000, it’s time to upgrade). The session is based on the books I’ve written about query tuning and execution plans, plus years and years of doing lots of query tuning.

Right now I’ve got two events scheduled. Before SQL Saturday #286 in Louisville, KY, I’ll be putting on this precon. We’re limited to seating, so don’t wait. You can go here to register. Then we can get together the next day at the SQL Saturday event to get some more education from all the great speakers there. Next, before SQL Saturday #302 in Albany, NY (their first one, ever), I’ll be hosting this. You can register by clicking here. Don’t miss the early bird special. Again, the next day will be filled with learning at the SQL Saturday event.

I’m working on taking this to other locations and venues. If you’re interested, please get in touch. I’ll do what I can to come to you.

If you have a particularly thorny query, bring it along with an actual execution plan. If we have time at the end of the day, I’ll take a look and makes suggestions, live (uh, please, no sensitive patient data or anything like that).

Let’s get together and talk query tuning.

Mar 18 2014

Finding Mistakes

Ever had that moment where you start getting errors from code that you’ve tested a million times? I had that one recently. I had this little bit of code for pulling information directly from query plans in cache:

WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
QueryPlans AS
(
SELECT RelOp.pln.value(N'@PhysicalOp', N'varchar(50)') AS OperatorName,
RelOp.pln.value(N'@NodeId',N'integer') AS NodeId,
RelOp.pln.value(N'@EstimateCPU', N'decimal(10,9)') AS CPUCost,
RelOp.pln.value(N'@EstimateIO', N'decimal(10,9)') AS IOCost,
dest.text
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY deqp.query_plan.nodes(N'//RelOp') RelOp (pln)
)

SELECT  qp.OperatorName,
        qp.NodeId,
        qp.CPUCost,
        qp.IOCost,
        qp.CPUCost + qp.IOCost AS EstimatedCost
FROM    QueryPlans AS qp
WHERE   qp.text = 'some query or other in cache'
ORDER BY EstimatedCost DESC;

I’ve probably run this… I don’t know how many times. But… I’m suddenly getting an error:

Msg 8114, Level 16, State 5, Line 7
Error converting data type nvarchar to numeric.

What the hell? There is no where this should be occurring. I dig through the query over and over and I can’t figure it out. Until… I finally notice that one of the operators in the plan has the CPUCost value stored as a float:

EstimateCPU=”1e-006″

Ummmm, since when? Since forever. I’ve just been lucky with my code. I’d just never hit a sufficiently small cost in the plans before. I hadn’t bothered to look for the actual data type in use in the schema definition, although it’s right there:

<xsd:attribute name=”EstimateCPU” type=”xsd:double” use=”required”/>

 

Long-Kiss-Goodnight-samuel-l-jackson-14034435-720-480

I never did one thing right in my life, you know that? Not one. That takes skill.

What did I do wrong? I was looking at the data output from the queries and in the plans rather than looking at the structure to know what to expect. It’s the kind of thing I would never do with T-SQL. I would always look to the table structure to know what data type a given column was. But in this case, with the XML, I looked at the data and made an assumption. And we all knows what that means. It makes an ass out of you and mption.

Or, in this case, me and mption.

Anyway, the corrected query is a pretty trivial change:

WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
QueryPlans AS
(
SELECT RelOp.pln.value(N'@PhysicalOp', N'varchar(50)') AS OperatorName,
RelOp.pln.value(N'@NodeId',N'integer') AS NodeId,
RelOp.pln.value(N'@EstimateCPU', N'float') AS CPUCost,
RelOp.pln.value(N'@EstimateIO', N'float') AS IOCost,
dest.text
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY deqp.query_plan.nodes(N'//RelOp') RelOp (pln)
)

SELECT  qp.OperatorName,
        qp.NodeId,
        qp.CPUCost,
        qp.IOCost,
        qp.CPUCost + qp.IOCost AS EstimatedCost
FROM    QueryPlans AS qp
WHERE   qp.text = 'SELECT * FROM HumanResources.vEmployee AS ve'
ORDER BY EstimatedCost DESC;

But I do feel bad if anyone has been using this the way I showed it. ‘Cause, yeah, I’ve demonstrated with this code in the past. Oops. However, great point. Especially when working with a public XML schema like this, it pays to go and look at that schema the same way you would a table in order to ensure that you’re using the correct data types.

Mar 11 2014

sp_updatestats Is Not Smart

No, I don’t mean the use of sp_updatestats is not smart. It’s a fine, quick mechanism for getting statistics updated in your system. But the procedure itself is not smart. I keep seeing stuff like “sp_updatestats knows which statistics need to be updated” and similar statements.

Nope.

Not true.

Wanna know how I know? It’s tricky. Ready? I looked at the query. It’s there, in full, at the bottom of the article (2014 CTP2 version, just in case yours is slightly different, like, for example, no Hekaton logic). Let’s focus on just this bit:

if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0)))

The most interesting part is right at the front, @ind_rowmodctr <> 0. That value is loaded with the cursor and comes from sys.sysindexes and the rowmodctr column there. In short, we can know that the “smart” aspect of sp_updatestats is that it assumes if there are any modifications, then updating statistics is good to go. We can argue for hours over how exactly you determine whether or not statistics are sufficiently out of date to warrant an update, but I’d be willing to bet that the sophisticated answers are not going to include just finding everything that’s been touched.

Now, don’t get me wrong. I’m not implying, suggesting or stating that sp_updatestats shouldn’t be used. It should. It’s fine. Just be very clear about what it does and how it does it.

 

Just a reminder, I’m putting on an all day seminar on query tuning in Louisville on June 20th, 2014. Seats are going fast, so please sign up early.

sp_updatestats:

USE [master]
GO
/****** Object:  StoredProcedure [sys].[sp_updatestats]    Script Date: 3/6/2014 8:09:58 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER procedure [sys].[sp_updatestats]
	@resample char(8)='NO'
as

	declare @dbsid varbinary(85)

	select @dbsid = owner_sid
		from sys.databases
		where name = db_name()

	-- Check the user sysadmin
	if not is_srvrolemember('sysadmin') = 1 and suser_sid() <> @dbsid
	begin
		raiserror(15247,-1,-1)
		return (1)
	end
	-- cannot execute against R/O databases  
	if DATABASEPROPERTYEX(db_name(), 'Updateability')=N'READ_ONLY'
	begin
		raiserror(15635,-1,-1,N'sp_updatestats')
		return (1)
	end

	if upper(@resample)<>'RESAMPLE' and upper(@resample)<>'NO'
	begin
		raiserror(14138, -1, -1, @resample)
		return (1)
	end

	-- required so it can update stats on ICC/IVs
	set ansi_warnings on
	set ansi_padding on
	set arithabort on
	set concat_null_yields_null on
	set numeric_roundabort off

	declare @exec_stmt nvarchar(4000)		-- "UPDATE STATISTICS [sysname].[sysname] [sysname] WITH RESAMPLE NORECOMPUTE"
	declare @exec_stmt_head nvarchar(4000)	-- "UPDATE STATISTICS [sysname].[sysname] "
	declare @options nvarchar(100)			-- "RESAMPLE NORECOMPUTE"

	declare @index_names cursor

	declare @ind_name sysname
	declare @ind_id int
	declare @ind_rowmodctr int
	declare @updated_count int
	declare @skipped_count int

	declare @sch_id int
	declare @schema_name sysname
	declare @table_name sysname
	declare @table_id int
	declare @table_type char(2)
	declare @schema_table_name nvarchar(640) -- assuming sysname is 128 chars, 5x that, so it's > 128*4+4

	declare @compatlvl tinyint

	-- Note that we looked up from sys.objects$ instead sys.objects since some internal tables are not visible in sys.objects
	declare ms_crs_tnames cursor local fast_forward read_only for
		select name, object_id, schema_id, type from sys.objects$ o
		where o.type = 'U' or o.type = 'IT'
	open ms_crs_tnames
	fetch next from ms_crs_tnames into @table_name, @table_id, @sch_id, @table_type

	-- determine compatibility level
	select @compatlvl = cmptlevel from sys.sysdatabases where name = db_name()

	while (@@fetch_status <> -1) -- fetch successful
	begin
		-- generate fully qualified quoted name
		select @schema_name = schema_name(@sch_id)
		select @schema_table_name = quotename(@schema_name, '[') +'.'+ quotename(rtrim(@table_name), '[')

		-- check for table with disabled clustered index
		if (1 = isnull((select is_disabled from sys.indexes where object_id = @table_id and index_id = 1), 0))
		begin
			-- raiserror('Table ''%s'': cannot perform the operation on the table because its clustered index is disabled', -1, -1, @tablename)
			raiserror(15654, -1, -1, @schema_table_name)
		end
		else
		begin
			-- filter out local temp tables
			if ((@@fetch_status <> -2) and (substring(@table_name, 1, 1) <> '#'))
			begin
				-- reset counters for this table
				select @updated_count = 0
				select @skipped_count = 0

				-- print status message
				--raiserror('Updating %s', -1, -1, @schema_table_name)
				raiserror(15650, -1, -1, @schema_table_name)

				-- initial statement preparation: UPDATE STATISTICS [schema].[name]
				select @exec_stmt_head = 'UPDATE STATISTICS ' + @schema_table_name + ' '

				-- using another cursor to iterate through
				-- indices and stats (user and auto-created)
				-- Hekaton indexes do not appear in sys.sysindexes so we need to use sys.stats instead
				-- Hekaton indexes do not support rowmodctr so we just return 1 which will force update stats
				-- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables
				if ((@table_type = 'U') and (1 = OBJECTPROPERTY(@table_id, 'TableIsMemoryOptimized')))	-- Hekaton tables
				begin
					set @index_names = cursor local fast_forward read_only for
						select name, stats_id, 1 as rowmodctr
						from sys.stats
						where object_id = @table_id and indexproperty(object_id, name, 'ishypothetical') = 0 
						order by stats_id
				end
				else
				begin
					set @index_names = cursor local fast_forward read_only for
						select name, indid, rowmodctr from sys.sysindexes
						where id = @table_id and indid > 0 and indexproperty(id, name, 'ishypothetical') = 0 
						and indexproperty(id, name, 'iscolumnstore') = 0
						order by indid
				end

				open @index_names
				fetch @index_names into @ind_name, @ind_id, @ind_rowmodctr

				-- if there are no stats, skip update
				if @@fetch_status < 0
					--raiserror('    %d indexes/statistics have been updated, %d did not require update.', -1, -1, @updated_count, @skipped_count)
					raiserror(15651, -1, -1, @updated_count, @skipped_count)
				else 
				begin
					while @@fetch_status >= 0
					begin
						-- create quoted index name
						declare @ind_name_quoted nvarchar(258)
						select @ind_name_quoted = quotename(@ind_name, '[')

						-- reset options
						select @options = ''

						declare @is_ver_current bit
						select @is_ver_current = stats_ver_current(@table_id, @ind_id)

						-- note that <> 0 should work against old and new rowmodctr logic (when it is always > 0)
						-- also, force a refresh if the stats blob version is not current
						if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0)))
						begin
							select @exec_stmt = @exec_stmt_head + @ind_name_quoted

							-- Add FULLSCAN for hekaton tables
							-- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables
							if ((@table_type = 'U') and (1 = OBJECTPROPERTY(@table_id, 'TableIsMemoryOptimized')))	-- Hekaton tables
								select @options = 'FULLSCAN'

							-- add resample if needed
							else if (upper(@resample)='RESAMPLE')
								select @options = 'RESAMPLE '

							if (@compatlvl >= 90)
								-- put norecompute if local properties are set to AUTOSTATS = OFF
								-- note that ind name is unique within the object
								if ((select no_recompute from sys.stats where object_id = @table_id and name = @ind_name) = 1)
								begin
									if (len(@options) > 0) select @options = @options + ', NORECOMPUTE'
									else select @options = 'NORECOMPUTE'
								end

							if (len(@options) > 0)
								select @exec_stmt = @exec_stmt + ' WITH ' + @options

							--print @exec_stmt
							exec (@exec_stmt)
							--raiserror('    %s has been updated...', -1, -1, @ind_name_quoted)
							raiserror(15652, -1, -1, @ind_name_quoted)
							select @updated_count = @updated_count + 1
						end
						else
						begin
							--raiserror('    %s, update is not necessary...', -1, -1, @ind_name_quoted)
							raiserror(15653, -1, -1, @ind_name_quoted)
							select @skipped_count = @skipped_count + 1
						end
						fetch @index_names into @ind_name, @ind_id, @ind_rowmodctr
					end
					--raiserror('    %d index(es)/statistic(s) have been updated, %d did not require update/disabled.', -1, -1, @updated_count, @skipped_count)
					raiserror(15651, -1, -1, @updated_count, @skipped_count)
				end
				deallocate @index_names
			end
		end
		print ' '
		fetch next from ms_crs_tnames into @table_name, @table_id, @sch_id, @table_type
	end
	raiserror(15005,-1,-1)
	deallocate ms_crs_tnames
	return(0) -- sp_updatestats
Feb 27 2014

Let’s Talk Query Tuning

I spend quite a bit of time writing about query tuning on this blog. I’ve written (re-written and am actively re-writing) books on query tuning. But what I like most is talking about query tuning. I love giving sessions at various events on different aspects of query tuning, but, what I like the most is spending a whole day, trying to do a complete brain dump to get as much information out there as possible. Sound attractive? Then I’ve got a great deal for you. Come to Louisville on June 20th, 2014. We will talk query tuning at length. You have a specific question? Let’s get it answered. Then, the next day, we can all go to SQL Saturday 286 there in Louisville to get more learning and some serious networking. What’s not to like?

Feb 18 2014

The CASE Statement and Performance

In case you don’t know, this query:

UPDATE dbo.Test1
SET C2 = 2
WHERE C1 LIKE '%33%';

Will run quite a bit slower than this query:

UPDATE dbo.Test1
SET C2 = 1
WHERE C1 LIKE '333%';

Or this one:

UPDATE dbo.Test1
SET C2 = 1
WHERE C1 = '333';

That’s because the second two queries have arguments in the filter criteria that allow SQL Server to use the statistics in an index to look for specific matching values and then use the balanced tree, B-Tree, of the index to retrieve specific rows. The argument in the first query requires a full scan against the index because there is no way to know what values might match or any path through the index to simply retrieve them.

But, what if we do this:

UPDATE dbo.test1
SET C2 =
CASE
WHEN C1 LIKE '19%' THEN 3
WHEN C1 LIKE '25%' THEN 2
WHEN C1 LIKE '37%' THEN 1
END;

We’re avoiding that nasty wild card search, right? So the optimizer should just be able to immediately find those values and retrieve them… Whoa! Hold up there pardner. Let’s set up a full test:

IF (SELECT  OBJECT_ID('Test1')
   ) IS NOT NULL 
    DROP TABLE dbo.Test1; 
GO
CREATE TABLE dbo.Test1 (C1 VARCHAR(50),C2 INT, C3 INT IDENTITY);

SELECT TOP 1500
        IDENTITY( INT,1,1 ) AS n
INTO    #Nums
FROM    Master.dbo.SysColumns sC1,
        Master.dbo.SysColumns sC2;
        
INSERT  INTO dbo.Test1
        (C1,C2)
        SELECT  n, n
        FROM    #Nums;
        
DROP TABLE #Nums;

CREATE CLUSTERED INDEX i1 ON dbo.Test1 (C1) ;


UPDATE dbo.test1
SET C2 =
CASE
WHEN C1 LIKE '%42%' THEN 3
WHEN C1 LIKE '%24%' THEN 2
WHEN C1 LIKE '%36%' THEN 1
END

DBCC FREEPROCCACHE()

UPDATE dbo.test1
SET C2 =
CASE
WHEN C1 LIKE '19%' THEN 33
WHEN C1 LIKE '25%' THEN 222
WHEN C1 LIKE '37%' THEN 11
WHEN C1 LIKE '22%' THEN 5
END

I added the extra CASE evaluation in the second query in order to get a different query hash value.

Here are the execution plans from the two queries:

TwoPlans

They’re pretty identical. Well, except for me forcing a difference in the hash values, they’re identical except for the details in the Compute Scalar operator. So what’s going on? Shouldn’t that second query use the index to retrieve the values? After all, it avoided that nasty comparison operator, right? Well, yes, but… we introduced a function on the columns. What function you ask? The CASE statement itself.

This means you can’t use a CASE statement in this manner because it does result in bypassing the index and statistics in the same way as using functions against the columns do.