Category: SQL Server 2014

May 05 2015

Effects of Persisted Columns on Performance

I live for questions. And my favorite questions are the ones where I’m not completely sure of the answer. Those are the questions that make me stop presenting in order to take a note so I can try to answer the question later, usually in a blog post. Guess where we are today?

I was asked at SQL Bits in London about the direct impact of the PERSISTED operator on calculated columns, both inserts and selects. I didn’t have a specific answer, so I wrote it down for later (and asked the, self-described, persisting Dane, to email me to remind me. He did, so I put together a few tests to try to answer his question.

First, I created three tables:

CREATE TABLE dbo.PersistTest (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

CREATE TABLE dbo.PersistTest2 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2)
);
GO

CREATE TABLE dbo.PersistTest3 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

The first table has two calculated columns, one PERSISTED and one not. The second table has a non-persisted column only. The third table has a persisted column only. This way, I can see the behavior across all these tables and compare them to see where PERSISTED adds to, or removes from, performance.

I took a very simple set of queries and ran these a few times on each table to get an average execution time:

INSERT dbo.PersistTest (
RandomValue1,
RandomValue2) VALUES (
42.2 , -- RandomValue1 - decimal
44.4  -- RandomValue2 - decimal
);

The results were:

Persist Persist 2 Persist 3
3580.5 1626 2260.5

On average, across about five executions, you can see a distinct difference of about 600 microseconds between Persist 2 and Persist 3, and an even larger average on the single Persist table. Let’s try the same thing with a few SELECT statements:

SELECT  pt.CalcValue,
        pt.PersistValue
FROM    dbo.PersistTest AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.CalcValue
FROM    dbo.PersistTest2 AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.PersistValue
FROM    dbo.PersistTest3 AS pt
WHERE   pt.PersistTestID = 3;
GO

While the results are spectular:

Persist Persist 2 Persist 3
115.5 123.5 109.5

You can see that making the calculation on the fly is more costly with both Persist and Persist2 being higher than Persist3, which had no calculations to retrieve data.

Now, let’s run the same thing, but, how about, 10,000 times so we can really see what the differences are over time. First, the inserts:

Persist Persist 2 Persist 3
84854 68783 73817

Still measured in microseconds, you can see that the cumulative totals are higher for the persisted columns than they are for the non-persisted column. So, there is an overhead for storing the extra information. But, what about queries?

Persist Persist 2 Persist 3
Average 221.25 222.16 100.54
Cumulative 2203431 2254238 1005469

I’d say that’s pretty conclusive. When I hit the SELECT statement 10,000 times instead just five or six, we clearly have differences in behavior. Interestingly enough, the logical reads were identical at 20,000 each. So it really did come down to the time spent calculating the values versus the time spent just retrieving them.

Caveats. This is a very simple test with simple calculations and small persisted values. Even doing something 10,000 times isn’t the same as doing it one million or one hundred million. In short, your mileage may vary, some assembly required, for queries lasting longer than four hours call your physician, it depends.

You can see that a PERSISTED column is going to negatively impact INSERT, but, it’s going to positively impact SELECT. The question is, which one is giving you pain at the moment.

UPDATE: Transposed the data in the final table between Persist 2 & Persist 3. Fixed it.

Apr 07 2015

Error: Unknown Property ismemoryoptimized

If you’re starting the process of moving your databases in Azure SQL Databast to v12, you need to do one thing. Make sure you have SQL Server 2014 CU5 or better installed on your machine with Management Studio (current list of updates). Otherwise, like me, you’ll be looking at the above error.

Just a quick blog post to help out. I saw this error, did a search, and didn’t hit a single entry telling me what to do. I started the install of CU6 (I needed to catch up on cumulative updates anyway). While that was happening, I went to Twitter and posted to #sqlhelp to see if anyone else had hit this. I got a response from Amit Banarjee pointing me to this MSDB blog on the topic, reinforcing the decision I had already made. I just wish they had posted the error along with everything else in the blog post. It would make things easier.

Apr 06 2015

Constraints and SELECT Statements

I’ve posted previously about how a foreign key constraint can change how a SELECT query behaves. Logically that just makes sense. But other types of constraints don’t affect execution plans do they?

Yes.

Let’s take this constraint as an example:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK 
ADD  CONSTRAINT CK_SalesOrderDetail_UnitPrice 
CHECK  ((UnitPrice>=(0.00)))

That will ensure that no values less than zero can slip in there. We can even validate it:

INSERT Sales.SalesOrderDetail
        (SalesOrderID,
         CarrierTrackingNumber,
         OrderQty,
         ProductID,
         SpecialOfferID,
         UnitPrice,
         UnitPriceDiscount,
         rowguid,
         ModifiedDate
        )
VALUES  (60176, -- SalesOrderID - int
         N'XYZ123', -- CarrierTrackingNumber - nvarchar(25)
         1, -- OrderQty - smallint
         873, -- ProductID - int
         1, -- SpecialOfferID - int
         -22, -- UnitPrice - money
         0.0, -- UnitPriceDiscount - money
         NEWID(), -- rowguid - uniqueidentifier
         GETDATE()  -- ModifiedDate - datetime
        );

Will give me an error:

Msg 547, Level 16, State 0, Line 470
The INSERT statement conflicted with the CHECK constraint “CK_SalesOrderDetail_UnitPrice”. The conflict occurred in database “AdventureWorks2014″, table “Sales.SalesOrderDetail”, column ‘UnitPrice’.

Let’s look at a SELECT query now. If we run this:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.';

The resulting execution plan looks like this:

Constraint

But, if I modify the query to look like this, adding an additional AND filter on the constrained UnitPrice column:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.'
        AND sod.UnitPrice > $0.0;

You know what happens to the execution plan? Nothing. It stays exactly the same. The optimizer knows that in order to satisfy the query, it can safely ignore the change in the WHERE clause. In fact, you can look at the SELECT operator properties for the two different plans and note that while the Query Hash values changes, the Plan Hash value stays the same. The plans are identical.

With that knowledge, I’m going to modify the query to look like this, reversing the UnitPrice reference to look for data that violates the constraint:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.'
        AND sod.UnitPrice < $0.0;

And now we have a new execution plan:

Constraint_scan

 

The optimizer recognized that there is no way that any data can be returned with the WHERE clause above because there is an enforced constraint (note the use of the WITH CHECK clause on the constraint). This completely changes the execution plan in every possible way. Now, instead of attempting to access the data, a Constant Scan operator is put in as a place holder for an empty result set.

To sum up, yes, constraints absolutely affect the choices made by the optimizer when those constraints would have an affect on the plan, even a SELECT query. Also, I would argue, this means that the use of enforced constraints can be a performance enhancer since the optimizer can make intelligent choices about how a given query is dealt with.

UPDATE: Fixed a typo that said the constraint prevented data that was equal to or less than zero. It’s only for data less than zero.

Mar 30 2015

Does the New Cardinality Estimator Reduce Bad Parameter Sniffing

parameter sniffing https://www.flickr.com/photos/nalbertini/6224914311/in/photolist-au5j9P-oovkHJ-j5hbfL-o2tQP4-5rjH9-o63z6j-e16cSr-o6eBtd-83UkyT-61apWt-xMWdz-6vUmcL-ojqHXt-krd1A8-bwvg6r-4wvJLh-pweDW5-5UzK52-NwvaB-4uQ6em-dZ7BLC-reoTH4-7RN6Bi-kr7ViJ-5BgntE-83a6sH-duXGC-6c3j6-Rphmr-4ceWRz-9pMBSV-4mz1HK-2kLC5w-6N2QGL-bcP2Qc-62i9cA-dPe47C-6ugVmj-59x1pS-6PYR8J-9QpqJK-83XraW-xJ3MS-mkE3Fq-5uJunA-oMMXp-rEX8bF-of5Fpr-3aJrZd-6WLBU7/No. Next question.

Although, that answer can be slightly, ever so slightly, nuanced…

Parameter sniffing is a good thing. But, like a good wine, parameter sniffing can go bad. It always comes down to your statistics. A very accurate set of statistics with very little data skew (some values that have radically more/less data than other values) and a very even distribution (most values have approximately similar cardinality), and parameter sniffing is your bestest buddy on the planet (next to a tested backup). But, introduce some data skew, let the stats get wildly out of date, or suffer from seriously uneven distribution, and suddenly your best friend is doing unspeakable things to your performance (kind of like multi-statement table valued user defined functions).

SQL Server 2014 has the first upgrade to the cardinality estimator since SQL Server 7. To get the best possible understanding of what that means and how it works, I can only recommend Joseph Sack’s white paper. It’s THE source of good information. All the new functionality within the estimator, for the most part, in most situations, will probably lead to more accurate row estimates. This means, in theory, less bad parameter sniffing. But, if you are benefiting from the old estimation engine, and in some cases you absolutely are, then you may see an increase in bad parameter sniffing. It comes right back around to your statistics and your data distribution.

So, does the new cardinality estimator lead to a reduction in bad parameter sniffing? It still mostly depends on your statistics. So, I can’t say, with certainty that the new cardinality estimator in SQL Server 2014 literally reduces bad parameter sniffing. In many instances, probably even most, the changes to the model are going to resort in better row estimates which means better choices by the optimizer. In some instances this will reverse and you may see pain, directly related to the changes in the cardinality estimator. But, overall, the fundamental behavior of parameter sniffing, again, a very positive thing, are unchanged. So, to a large degree, at least in my mind, no, the new cardinality estimator doesn’t reduce bad parameter sniffing because, most of the time, it’s due to the three causes mentioned before, out of date statistics, data skew and uneven data distribution. The new cardinality estimator doesn’t change these fundamentals, so, mostly, it won’t reduce bad parameter sniffing.

Mar 24 2015

Simple Parameterization and Data Types

Simple paramaterization occurs when the optimizer determines that a query would benefit from a reusable plan, so it takes the hard coded values and converts them to a parameter. Great stuff. But…

Let’s take this example. Here’s a very simple query:

SELECT ct.*
FROM    Person.ContactType AS ct
WHERE   ct.ContactTypeID = 7;

This query results in simple parameterization and we can see it in the SELECT operator of the execution plan:

param

We can also see the parameter that was defined in use in the predicate of the seek operation:

predicate

Hang on.

Who the heck put the wrong data type in there that’s causing an implicit conversion? The query optimizer did it. Yeah. Fun stuff. If I change the predicate value to 7000 or 700000 I’ll get two more plans and I can see them all by querying the cache. But, here’s a fun little bit. I just searched the XML of the plan, the data type isn’t stored anywhere that I can see. But, if you query the plan cache, look what the text of the query is:

(@1 tinyint)SELECT [ct].* 
FROM [Person].[ContactType] [ct] 
WHERE [ct].[ContactTypeID]=@1

Now, in this case, this type of conversion doesn’t hurt performance in any way. We’re still getting an index seek. I also ran some tests with strings. The optimizer was setting them to VARCHAR(8000) and I got an implicit conversion that, again, didn’t hurt performance. But, I wouldn’t be completely shocked, if, at some point, that lead to scans. What I find it really interesting is, that the data type isn’t stored with the plan. I fully expected it to be there.

Mar 12 2015

Monitoring for Timeouts

The question came up at SQL Rally, “Can you use Extended Events to monitor for query timeouts?”

My immediate response was yes… and then I stood there trying to think of exactly how I’d do it. Nothing came quickly to mind. So, I promised to track down the answer and post it to the blog.

My first thought is to use the Causality Tracking feature to find all the places where you have a sql_batch_starting without a sql_batch_completed (or the same thing with rpc calls). And you know what, that would work. But, before I got too deep in trying to write the query that would find all the mismatched attach_activity_id values that have a sequence of 1, but not one of 2, I did some additional reading. Seems there’s another little trick that works even better (sort of). It’s to set up an Event Pairing Target.

This is pretty slick. Here’s the code I used to set up for “timeouts” in SQL Server Management Studio:

CREATE EVENT SESSION [Timeouts] ON SERVER
ADD EVENT sqlserver.sql_batch_completed (
    ACTION (sqlserver.session_id)
    WHERE ([result] <> (2))),
ADD EVENT sqlserver.sql_batch_starting (
    ACTION (sqlserver.session_id))
ADD TARGET package0.pair_matching (SET begin_event = N'sqlserver.sql_batch_starting',
                                   begin_matching_actions = N'sqlserver.session_id',
                                   begin_matching_columns = N'batch_text',
                                   end_event = N'sqlserver.sql_batch_completed',
                                   end_matching_actions = N'sqlserver.session_id',
                                   end_matching_columns = N'batch_text')
WITH (MAX_MEMORY = 4096 KB,
      EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
      MAX_DISPATCH_LATENCY = 30 SECONDS,
      MAX_EVENT_SIZE = 0 KB,
      MEMORY_PARTITION_MODE = NONE,
      TRACK_CAUSALITY = ON,
      STARTUP_STATE = OFF)
GO

A couple of notes. You see the “timeout” in quotes above. That’s because I wasn’t getting a timeout in SSMS, despite setting one. Rather than muck around (I’ll figure that out later), I just set a long running query and then stopped it, but, I put a filter in place to not capture batches that terminated from an Abort command, [result] <> 2. But that did it. As soon as I aborted, I could use the query supplied at the link for Event Pairing Target to immediately see my unmatched pair. That’s quick and easy timeout monitoring using extended events. Output here:

<event name="sql_batch_starting" package="sqlserver" timestamp="2015-03-10T17:25:14.089Z">
    <data name="batch_text">
      <type name="unicode_string" package="package0" />
      <value>SELECT  *
FROM    Production.ProductCostHistory AS pch,
        Production.BillOfMaterials AS bom,
        Person.ContactType AS ct;</value>
    </data>
    <action name="session_id" package="sqlserver">
      <type name="uint16" package="package0" />
      <value>63</value>
    </action>
    <action name="attach_activity_id_xfer" package="package0">
      <type name="activity_id_xfer" package="package0" />
      <value>1BF4899F-4B74-434B-9CDE-2B646A560FB1-0</value>
    </action>
    <action name="attach_activity_id" package="package0">
      <type name="activity_id" package="package0" />
      <value>4B92C757-F780-4D8C-B1DA-F95C206A32EE-1</value>
    </action>
  </event>

So, here’s the biggest question, would I actually use this in production?

I’m a little nervous about using the batch_text as a matching point. With rpc_completed you could just use session_id and object_name or object_id. That’d be great. So, if I were going to use this in production, I’d probably only monitor for certain databases, putting tighter filtering in place for batch commands. But, other than that, I can query the target and pull out any timeout objects, so this is actually pretty slick and works well.

The only shortcoming I see is that you can’t add two pair matching targets in a session, so you would need one session for batch timeouts and another for rpc timeouts. Not the end of the world, but worth noting.

 

 

Jan 19 2015

Understand the True Source of Problems

There’s an old joke that goes, “Doctor, doctor, it hurts when I do this.” While the person in question swings their arm over their head. The doctor’s response is, “Don’t do that.” Problem solved, right? Well, maybe not. Let’s take a quick example from life. I do crossfit (yeah, I’m one of those, pull up a chair I’ll tell you all about my clean & jerk progress… kidding). I’ve been experiencing pain in my shoulder. “It hurts when I do this.” But, I’m not going to stop. I’ve been working with my coach to identify where the pain is and what stretches and warm-ups I can do to get around it (assuming it’s not a real injury, and it isn’t). In short, we’re identifying the root cause and addressing the issue rather than just coming up with a “don’t do that” style solution.

As is usual with one of my rambling intros, I’m going to tie this back to our work as data professionals, hang on.

I recently ran into a set of requirements that a new DBA was told are “best practices” and that he had to comply with. I’m editing them in order to save anyone embarrassment (although, if you wrote these, you should be embarrassed). I think that each one represents a moment of “it hurts” followed by “don’t do that” which, as you’re going to see, is absolutely the wrong response. As crazy as they all are, it’s not the first time I’ve seen them. This information is clearly coming from some fever-filled internet-swamp where stupid mutates and grows. Let’s nip these in the bud.

1 – All queries against the production database must be executed by using RDP to connect to the production system. This is because connecting through SSMS will cause the server to crash

I couldn’t even begin to guess what happened that caused a system outage because of a query from SSMS, but I suspect it was some sort of crazy, extreme outlier of a query that probably didn’t run much better while RDPed into the server, but somehow avoided a network issue (or six). Who knows. Suffice to say, no. This is crazy. On multiple levels. The most important being, you’re giving people access to the operating system on production that really probably shouldn’t have it. And, you can only have one person connecting to production at any one time. Two teams ready for a deployment? Tough. Oh, and your production system, in addition to all the work it’s doing for SQL Server, it now has to handle all the work of SSMS and displaying the results of your queries. Was your server under stress when you started? It’s worse now. I really don’t know of anyone, including a very large number of remote DBAs, who don’t connect to SQL Server through SSMS running on their desktop.

2 – NOLOCK is required on every query. It makes things run faster.

The magic Turbo Button of SQL Server rears its ugly head, yet again. I suspect that if you went and looked at all the responses to questions on forums, blog posts, articles and presentations, there are more places defining exactly why this is bad than almost any topic except backups. One more time. Using NOLOCK on all your queries…  and if you’re putting it everywhere anyway, why not just use READ_UNCOMMITTED on your connections, it’s easier… Anyway, Using NOLOCK on your queries results in dirty reads. Yes, everyone says, “Oh, that means if someone is change ‘dog’ to ‘cat’ I could see either ‘dog’ or ‘cat’. I don’t care.” Right. That’s true. It also means when pages split and rearrange, you may miss rows or have rows duplicated. All the businesses I know love it when data is missing from the queries they’re running. I’m sure you wouldn’t mind if your bank used NOLOCK and told you that you didn’t have any money in it, right?

3 – Transactions take the server offline. Never use them in queries.

Again, I’m having a hard time imagining what exactly occurred to make this seem like a good idea. You do know that every query has an implicit transaction? And your server is still online. Can you over use transactions, your transactions can be overly large, your transactions can run too long, or you can have too many transactions. But no, just using a transaction will not bring the server down.

4 – Do not JOIN tables because it leads to performance problems.

I’m actually used to a slight variation on this, “Never join more than three tables.” This is just a holdover from the old days when several things were going on. One, the old query optimizers just sucked, so choices on loops or merge or hash joins were really bad. We also didn’t really know how to index our tables properly back in the day (well, I didn’t), so it hurt performance a lot to join between tables. Our code was horrendous back in the day too, so it didn’t help. But, since at least SQL Server 2000, the optimizer is good, performance is fine. I saw an 86 table join (not one I wrote) run in under 200ms on SQL Server 2000 on ancient processors and small memory. Talk about completely busting that myth. This rule is crazy. Seriously crazy. You’re using a RELATIONAL storage engine and then saying that you can’t actually use relationships. If you don’t have or need relational data, certainly a very valid option, use a non-relational data storage engine. But if you have relational data and you’re using a relational engine, I strongly, very strongly, recommend you use the all the relational tools. That includes JOINs but also includes, primary keys, foreign keys, enforced referential integrity, and all the rest.

5 – Don’t use SSMS’s ‘Edit Top 200 Rows’ because it leaves locks on the table.

One I actually agree with. But not because of what they’re saying. I suspect someone must have found a situation where the UPDATE process from this window held a lock. But I don’t think this is a good way to work because I don’t like the Top 200 rows approach because, well, here’s a query from this menu choice in SSMS:

SELECT TOP (200) BusinessEntityID, CreditCardID, ModifiedDate FROM Sales.PersonCreditCard

Spot any issues? Apart from the missing semi-colon and missing column owners? Yeah a top without an ORDER BY. Which 200 rows are we getting? Yes. If you want to edit data, use an UPDATE command or a proper interface for the data, not this.

6 – Stored procedures should never be used. Prepared statements should never be used.

Plan cache should never be used. Oh, I thought that was the next statement. Look, we don’t have to get into the stored procedures are a blessing from Odin vs. stored procedures are a frost giant trick argument. We can say that using parameterized queries (stored procedures or prepared statements) leads to the reuse of plans for cache. Whereas, writing nothing but ad hoc queries results in massive amounts of queries piled into the plan cache, running through the full optimization process, chewing up your memory, your CPU time, and then you never use them ever again. Can you think of a more efficient mechanism for hurting your performance than taking away these tools? Me neither. Parameterize your queries, somehow. Every ORM tool I’ve worked with or read about can do this (and they can all use stored procedures too, just saying). There is no good excuse for not using parameterized queries for most of your code. And yes, there are places where ad hoc or dynamic T-SQL makes sense. But I’d argue that they’re the exception, not the rule.

Please, take the time to understand what went wrong when you hit a problem. Don’t just shy away from the thing associated with it. It’s like the Dark Ages when they executed inanimate objects after an accident. Surely we’re a little more sophisticated than that. Identify the problem, understand it, then apply appropriate solutions.

If you want to talk more about performance tuning, and you’re attending SQL Rally Nordic 2015 in Copenhagen, I’m doing a full day pre-conference seminar. Click here now to register.

Dec 19 2014

“Pretty Plans vs. Performance” or “Grant Gets Pwned”

If you get an execution plan that looks like this:

prettyplan_uglyplan

I wouldn’t blame you for immediately thinking about query tuning. Especially if the code that generated it looks like this:

SELECT  soh.OrderDate,
        sod.OrderQty,
        sod.LineTotal
FROM    Sales.SalesOrderHeader AS soh
INNER JOIN Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
WHERE   soh.SalesOrderID IN (@p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10,
                             @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18,
                             @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26,
                             @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34,
                             @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42,
                             @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50,
                             @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58,
                             @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66,
                             @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74,
                             @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82,
                             @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90,
                             @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98,
                             @p99);

Let’s replace this with a table variable, maybe even one passed in as a parameter. The plan then looks like this:

prettyplan_pretty1

Ah, much prettier. I’m happy now, all’s right with the world… But, just in case, let’s look at performance. The first query ran in about 2.2ms and had 599 reads. The second query ran in about 24ms and had 598 reads… crud.

Well, let’s modify everything again. Instead of a table variable, we’ll use a temporary table and get some statistics into this puppy which will clean things right up. Here’s the new plan:

prettyplan_pretty2

Looks pretty familiar, although there are slight differences in the cost estimates between this plan and the preceding one. But the run time is 85ms with 714 reads AND I had to create the temporary table which added time to the whole thing.

Doggone it, that other plan is heinous and ugly and so is the query which uses an IN clause. Shouldn’t the cleaner, simpler, execution plan be an indicator that we’re going to get better performance?

Nope.

The thing is, just because the execution plan is simple and easy to understand does not mean it’s going to perform well. You can’t simply look to an execution plan to understand performance. You have to measure the run times on the query, look to the resources it’s going to use in order to understand where waits are likely, look at it’s reads, and take all this into account, along with understanding what the execution plan is doing in order to make the appropriate choices for performance on your system.

UPDATE:

I kept working with this because I was convinced I could get faster performance. The main difference as I saw it was that the optimizer sorted the data in the IN clause and I wasn’t explicitly sorting the data in any of my replacement queries. But nothing I did resulted in better execution times. And that was upsetting.

First, when you’re tuning a query, you’re going to look at the execution plans, as I did above. But, when you want to measure the performance of queries, it’s a very good idea to turn off execution plans and just capture the query metrics. I knew this and was doing it and you could see the results in the Extended Events where I was capturing each statement for the SPID I was working within. I also had the SET STATISTICS IO and SET STATISTICS TIME enabled for the query. Since each execution would cause those to fire as part of the statements and those were making my  ExEvents window messy, I decided to turn them off… WHOA! Query execution times radically changed.

In fact, my first attempt at tuning the query, substituting a table parameter, was suddenly faster than the original. The fastest was when I pre-sorted the data in a temporary table (discounting the costs of sorting and inserting the data into the temp table just for the moment). In fact, the prettiest plan was indeed the fastest.

Experimenting further, it was the STATISTICS IO that completely changed the execution times.

In short, pay no attention to my original post above, instead, let the lesson be that I need to be very cautious about the Observer Effect.

Sep 23 2014

SQL Server Query Performance Tuning

The latest update to my book, SQL Server Query Performance Tuning was released last week. This is the fourth edition of the book, and the third edition that I’ve been responsible for. At the urging of my editor, Jonathan Gennick, I have completely restructured the book for this release. The chapters have been broken up and rearranged so that they’re smaller, more easily consumed. Yes, I’ve worked with my technical editor, Joe Sack, to add lots of new information and to ensure that the existing information is more accurate and more useful. But, we’ve also added new chapters on topics that weren’t given enough attention in the previous versions of the book, such as parameter sniffing. It can certainly seem like these book releases are just some incremental changes on top of existing information, but not this time. This is a new book, with a new structure and new material (almost 80 pages worth), but, hopefully, with all the usefulness of the old book fundamentally intact. Please check it out.

If reading books isn’t your thing, or, you just want some personal interaction to assist your learning, I’ll be doing an all day, pre-conference seminar on query tuning at the PASS Summit this year, 2014. Go here to register.

Sep 16 2014

PASS Summit 2014 Pre-Conference Seminar

I’m putting on a pre-conference seminar (also known as a pre-con) at the PASS Summit this year. I’m really honored to be able to present this and I’m pretty excited about it. So, if you want to talk query tuning, let’s get together at the Summit. For a few fun facts about the event, check out this Q&A over at PASS. To register for the event and my pre-con, go here now.