Category: SQL Server 2014

Jul 29 2015

Hey Kids! Let’s Put on a Show at the Old Barn

Alternate Title: I’m traveling a bunch. Let’s get together and talk.

A bunch of trips and presentations coming up, so I thought I’d share. First, I’ll be SQL Saturday Omaha for my first time ever presenting in Nebraska. I’m excited to add this state to my list (which is almost over 40 now). If you’re not doing anything August 15th, let’s have a chat. Next, fingers crossed, I’ll get selected to fly back to my home state, Oklahoma, to go to SQL Saturday OKC. These guys put on a great event and hey, it’s Oklahoma so how can it be bad. I hope they announce soon. I need to schedule my flights. This one is on August 29th.

September also has several events. First, I’ll be at SQL Saturday Las Vegas on the 12th of September. My first time at this event so I’m looking forward to it. That’s followed immediately by a pre-con and a couple of sessions at ITDev Connections. I love inter-disciplinary conferences because it gives you a chance to branch out and learn more knowledge across the stack. It’s extremely useful if you’re getting into DevOps (which you should be). My pre-con is on query tuning and execution plans, using one to help the other. This is a great event with a lot of excellent speakers. Go here to register. But I’m not done. Down in San Diego, at their SQL Saturday on the 19th, I’ll also be doing a pre-con on query performance tuning on the 18th. I’ll see if I can’t squeeze some 2016 stuff in there too. Click here now to register for the precon.

But we’re not done. In October, I’m going to hop the pond for one my absolute favorite events, Red Gate’s very own SQL in the City: London. It’s a great event with a great collection of speakers. I won’t lie, we’re going to focus on Database Lifecycle Management (DLM), but there’ll be sessions on query tuning and all sorts of other stuff as well. It’s focused on the Red Gate tools and, let’s face it, it’s a Red Gate style event, so it’s fun and entertaining. If you really do want to drill down on DLM, there’s also the all day seminars that Red Gate is hosting that week. We’ve put a lot of work into these and I think they’re some pretty amazing hands on classes that will get you started automating your own database deployments. Also, while I’m over there, I think I may find my way to a SQL Relay event, so eyes peeled.

Later in October, well, it’s the PASS Summit. ‘Nough said. I’m doing a session focused on the beginner that’s all around statistics. They’re so important to everything you do with queries, that it’s a good idea to spend some time understanding them before you get into all the craziness of query tuning. If you’re attending Summit, please stop by my session. Click here to register for the PASS Summit. Fingers cross, I might be at SQL Saturday Portland if I get accepted.

One more that I don’t mean to leave off, but I think it deserves it’s own paragraph. Right before the PASS Summit, we’re bringing SQL in the City back to Seattle. That’s right, another shot at seeing the great stuff we’re going to bringing to London, plus a few different things (we mix up the speakers and sessions to help keep it all fresh).

Jul 20 2015

SQL Server Management Studio – Footloose and Fancy Free

That’s right. There’s been a divorce. SQL Server Management Studio (SSMS) has been divorced from the server product. In fact, Microsoft is inviting you to the new SSMS coming out party.

I’m pretty excited about this. While I’m very comfortable in SSMS, to a large degree, it’s like that old pair of jeans that you’ve worn for the last 10 years. They’re comfortable too. Well, maybe a little tight when you pull them on out of the wash. One of the knees is gone. The legs are frayed so much it almost looks intentional. You just noticed a hole in the bottom. The zipper is acting up… Yeah, OK. These jeans have had it. So has SSMS.

The plan from Microsoft is to upgrade SSMS independently from the boxed product. In fact, since one of the goals is to coordinate functionality within SSMS with releases of Azure SQL Database, I think it’s pretty safe to say that we’ll be seeing very regular and frequent updates. While this is a good thing for SSMS and it’s a good thing for Azure SQL Database. This level of frequent updates might not be a good thing for individual data pros if you’re not used to actively pursuing new knowledge and new tooling all the time. Or, it can act as an incentive to help keep your knowledge levels up to date because you’re more employable if you’re maintaining your skill set.

Let this act as a prod to go out and learn some new functionality. Oh, and buy a new pair of jeans.

 

Jul 06 2015

But I Don’t Have a Test Server…

I frequently see statements on forums along the lines of “I don’t have a test server, so I’m going to do something I’ve never done before directly on my production servers…” This is usually followed by questions along the lines of:

  • But that’s OK and I won’t get fired, right?
  • So how do I know if it worked?
  • Is that dangerous?

The replies are:

  • No. You should be fired
  • You won’t
  • By All the Gods! Yes! It’s stupidly, insanely, I don’t really like my production server, my data, my boss, or my job, dangerous. Yes.

I understand. You’re working for a not-for-profit, so you can’t afford tons of new servers. You’re looking at a 50tb production server of which, you can’t make a copy. Your bosses just don’t want to spend the money on servers for testing. So these are valid reasons to just skip any semblance of testing or assurance that what you’re doing is good, bad or indifferent for your servers, right? No. No, no, no. 10,000 times, no.

If you’re doing something for the very first time in production, you’re going about your job wrong.

You need to set up a test machine of some type, even if it’s just a tiny VM on your laptop. At a bare minimum, this test machine should have a copy of the structure of your production database. This means you can at least validate any script that’s going to modify structures will compile and run without error. After that, you should strive to have a representation of production data. This doesn’t have to be an exact copy. In fact, it shouldn’t be an exact copy. You should clean any sensitive data prior to using it for testing. Barring that, you should at have, again, this is part of the bare minimum, a copy of your production statistics applied to this test database.

To get a copy of statistics from within SQL Server Management Studio (SSMS), right click on the database in question and select Tasks. Within the Tasks drill down menu select Generate Scripts. You’ll get a simple wizard. Script the entire database. On the Set Scripting Options screen, you’re going to choose Advanced. There, you’ll scroll down until you see Script Statistics. By default, this will be disabled. You’re going to select Script Statistics and Histogram. From there, generate your database script. You can now run this script to create a copy of your production database.

After you create your brand new test database, I’d suggest turning off Auto Update Statistics. If you get a statistics update event to fire through any of the automatic means, it’s going to see that you have no data and will replace your production statistics with empty ones. Done.

You now have the bare minimum of a test database and you no longer have any excuses for not validating your scripts prior to running them on production.

Want to talk in person about query tuning? I’m doing two different all day pre-conference seminars. The first is in Las Vegas at Connections on September 14th. Click here now to register. The second is at SQL Saturday San Diego on September 18th. Let’s get together and talk about query tuning and execution plans.

Jun 01 2015

Parallelism and Columnstore Indexes

Columnstore indexes are fascinating and really cool. Unfortunately, they’re adding an interesting new wrinkle to an old problem.

What’s the Cost Threshold for Parallelism set to on your server? If you just said “The whatsis of whositz?” then the value is 5. The cost threshold is the point at which the estimated cost of an execution plan goes from definitely serial to possibly parallel. This default was set for SQL Server 2000 and hasn’t been changed since. I’ve long argued, loudly, that it’s too low. I’ve suggested changing it to a much higher value. My advice has gone from 35 to 50 and several places in between. You could just look at the median or the mode of costs on your system and use the higher of those values as starting point. Getting a cost higher than 5 in a query is insanely simple. Which means that queries that are absolutely not going to benefit from parallelism go parallel. This causes unnecessary CPU load on your system and slow performance of the queries. So, listen to my advice and run out there and change this value immediately.

Or…

Columnstore indexes have two methods of processing. The first is the row mode. That’s actually the same method used by all your standard indexes and tables. You can write a query that does aggregate calculations against a table that has a Columnstore index. You’ll then see a performance benefit. However, check the properties on the Columnstore operator in the execution plan. Does it say “Actual/Estimated Execution Mode = Row?” Then you’re getting row level processing. That can be fine and you can see a performance improvement using row mode. There is another level of processing though.

Batch mode processing is where it takes 1000 rows at a time and processes the aggregations simultaneously across multiple threads. Wait, multiple threads? Yes. You have to have parallelism in order to get batch mode. Joe Sack talks about it here. You can read more about it in the SQL Server documentation. Batch mode processing is faster (depending, etc.). Especially if you’re dealing with larger data sets (more than tens of thousands of rows), you’re going to want to see Batch as the Execution Mode in the Columnstore operators.

There is an issue here though. That’s right, we just bumped our Cost Threshold for Parallelism up to 5000 (if 50 is good, 500 is better, and we’ll add a zero just to be sure). OK, 50. How high does the cost have to be in order to see benefits from parallelism and batch mode processing? Ah, well, there’s the rub. When I was testing this, I had about 100,000 rows in a table. Nothing large. Execution times went from 100ms to 50ms. Yep, getting parallel execution, batch mode processing, of the nonclustered columnstore index cut my execution time in half.

Next question should be, oh, well, what was the estimated cost of that query? 30? 20? 15?

Try 3.7.

Yeah, I got a 50% improvement in performance by changing the Cost Threshold from 50 to 3.

In other tests I had a plan with an estimated cost of .77. Execution time went from 75ms to 57ms and reads went from 115 to 62 because of batch mode processing.

Now, I am not advocating dropping everyone’s Cost Threshold to 3 (or less). What I am absolutely saying is that you have a new wrinkle that you must add to your calculations. Are you using Columnstore indexes anywhere? Then you need to examine those execution plans. What are their costs? If you adjust your Cost Threshold below them, do you see a benefit? If so, how low did you just go? Now, is that negatively impacting your other queries? You could leave your Cost Threshold as is and use a query hint with TRACEFLAG 8649 (documented by the Sensei Paul White) on your queries against Columnstore indexes, but that comes with a list of caveats as long as your arm. The other alternative is to, again, leave the Cost Threshold alone and use Adam Machanic’s method of forcing parallelism just for the specified queries. These options can be unattractive, but you have to deal with getting parallelism for Columnstore indexes somehow.

Welcome to the quandary. Tuning in and around parallelism just got harder because of Columnstore Indexes.

Want to talk query tuning some more? I’m doing a live, all day session at Connections on query tuning using execution plans this September, 2015. We’ll go over examples with columnstore and lots more. Click here now to sign up.

 

May 26 2015

Database Engine Tuning Advisor

I would love to see the Database Engine Tuning Advisor (DTA) pulled from the product. Completely. Heck, I feel bad that I included a chapter on it in my query tuning book (all updated for SQL Server 2014 by the way). Let me tell you why we need to pull this tool.

First, I understand its purpose. It’s supposed to be a fast and easy way to get some performance tuning done for people who just don’t have the time or knowledge to go through the full process of gathering metrics, evaluating poor performers, understanding root causes and applying indexes to fix those causes. I also readily acknowledge that it actually is an amazing piece of software. If you don’t agree with that, go read this white paper. With those acknowledgements in place, I still think there are two distinct problems with the approach and a serious fundamental issue with the premise. Let’s talk about the problems of the approach first.

Many, probably even most, of your query performance problems are in your T-SQL. Some of your query performance problems are in your table structure. Some of your query performance problems are in your indexes and statistics, but not a majority. The DTA can suggest indexes, statistics, materialized views, and partitioning (partitioning for performance I might add, not data management, and that opens up a completely different can of worms we can’t address here, I don’t have time). In short, it can’t address your number one problem, your code. Since it can’t address the number one problem, all it can do is work around the edges. You’ll run this thing, think you’ve addressed your issues, but your issues are still there and now you’re just as stuck as before you found the DTA and ran it. It just doesn’t fix the core issue, your code.

The second problem I see with it is that it doesn’t have enough intelligence built into it’s functionality. I say that with some reluctance because it really is a pretty amazing piece of functional code. But it suffers from a couple of shortcomings there. It’s completely dependent on the load provided to it. If that load is off, it’s recommendations are off because it just can’t have the intelligence to recognize that a poorly representative load has been provided. This lack of intelligence is supposed to be offset by the person running the DTA to ensure that they are gathering the right information and that they can interpret and test the resultant recommendations. Which brings us to the fundamental issue with the premise.

This is supposed to be run by people with no internals knowledge. Right? BUT! These people are also supposed to make a judgement based on the recommendations whether or not they should be accepted. Further, they should test all the recommendations prior to applying them to their production server. They also must gather a well structured and meaningful representative load from their production system in order to supply the DTA with good information. Further, the very sophisticated set of tests around the DTA actually makes a fundamental assumption that could be radically off, that the person designing the database has done a good and thorough job of correctly normalizing the structures. You agree with all these assumptions on the part of the DTA? Am I alone in thinking that we have a problem here? If people lack any understanding of the internals they won’t be able to judge those recommendations. If people don’t have the ability to gather and interpret performance metrics they won’t be able to test the suggestions of the DTA or provide it with the necessary test load. In short, the DTA can’t be relied on to solve the problem it’s supposed to solve because of the root cause of that problem, peoples lack of knowledge and understanding.

I won’t even get into finding indexes with names like this:

[_dta_index_TD_13_15_1334295813__K44_K1_K8_K57_K17_K60_K16_2_3_4_5_6_7_9_10_11_12_13_14_15_18_19_20_21_22_23_24_25_26_27_28_29_]

Instead of evaluating the suggestions made by the DTA and applying just those that make sense and will have a positive impact, people assume that every single suggestion from the tool is Gospel. They apply them all, without thinking, without knowledge, without appreciation of the possibility, sometimes even the likelihood, of serious negative impact.

I recognize that many people are stuck. They have a SQL Server instance that’s causing them pain and they don’t have the knowledge necessary to fix it. Further, I know a few of you have used this tool successfully in some situations. I agree that there ought to be some way to mechanically and automatically tune the server. However, the DTA is not that tool, despite it’s clear and obvious sophistication. Let’s get rid of it.

Want to learn how to tune queries? I’m putting on an all day seminar at Connections in September. Click here right now to register. We won’t use the DTA.

May 05 2015

Effects of Persisted Columns on Performance

I live for questions. And my favorite questions are the ones where I’m not completely sure of the answer. Those are the questions that make me stop presenting in order to take a note so I can try to answer the question later, usually in a blog post. Guess where we are today?

I was asked at SQL Bits in London about the direct impact of the PERSISTED operator on calculated columns, both inserts and selects. I didn’t have a specific answer, so I wrote it down for later (and asked the, self-described, persisting Dane, to email me to remind me. He did, so I put together a few tests to try to answer his question.

First, I created three tables:

CREATE TABLE dbo.PersistTest (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

CREATE TABLE dbo.PersistTest2 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2)
);
GO

CREATE TABLE dbo.PersistTest3 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

The first table has two calculated columns, one PERSISTED and one not. The second table has a non-persisted column only. The third table has a persisted column only. This way, I can see the behavior across all these tables and compare them to see where PERSISTED adds to, or removes from, performance.

I took a very simple set of queries and ran these a few times on each table to get an average execution time:

INSERT dbo.PersistTest (
RandomValue1,
RandomValue2) VALUES (
42.2 , -- RandomValue1 - decimal
44.4  -- RandomValue2 - decimal
);

The results were:

Persist Persist 2 Persist 3
3580.5 1626 2260.5

On average, across about five executions, you can see a distinct difference of about 600 microseconds between Persist 2 and Persist 3, and an even larger average on the single Persist table. Let’s try the same thing with a few SELECT statements:

SELECT  pt.CalcValue,
        pt.PersistValue
FROM    dbo.PersistTest AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.CalcValue
FROM    dbo.PersistTest2 AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.PersistValue
FROM    dbo.PersistTest3 AS pt
WHERE   pt.PersistTestID = 3;
GO

While the results are spectular:

Persist Persist 2 Persist 3
115.5 123.5 109.5

You can see that making the calculation on the fly is more costly with both Persist and Persist2 being higher than Persist3, which had no calculations to retrieve data.

Now, let’s run the same thing, but, how about, 10,000 times so we can really see what the differences are over time. First, the inserts:

Persist Persist 2 Persist 3
84854 68783 73817

Still measured in microseconds, you can see that the cumulative totals are higher for the persisted columns than they are for the non-persisted column. So, there is an overhead for storing the extra information. But, what about queries?

Persist Persist 2 Persist 3
Average 221.25 222.16 100.54
Cumulative 2203431 2254238 1005469

I’d say that’s pretty conclusive. When I hit the SELECT statement 10,000 times instead just five or six, we clearly have differences in behavior. Interestingly enough, the logical reads were identical at 20,000 each. So it really did come down to the time spent calculating the values versus the time spent just retrieving them.

Caveats. This is a very simple test with simple calculations and small persisted values. Even doing something 10,000 times isn’t the same as doing it one million or one hundred million. In short, your mileage may vary, some assembly required, for queries lasting longer than four hours call your physician, it depends.

You can see that a PERSISTED column is going to negatively impact INSERT, but, it’s going to positively impact SELECT. The question is, which one is giving you pain at the moment.

UPDATE: Transposed the data in the final table between Persist 2 & Persist 3. Fixed it.

Apr 07 2015

Error: Unknown Property ismemoryoptimized

If you’re starting the process of moving your databases in Azure SQL Databast to v12, you need to do one thing. Make sure you have SQL Server 2014 CU5 or better installed on your machine with Management Studio (current list of updates). Otherwise, like me, you’ll be looking at the above error.

Just a quick blog post to help out. I saw this error, did a search, and didn’t hit a single entry telling me what to do. I started the install of CU6 (I needed to catch up on cumulative updates anyway). While that was happening, I went to Twitter and posted to #sqlhelp to see if anyone else had hit this. I got a response from Amit Banarjee pointing me to this MSDB blog on the topic, reinforcing the decision I had already made. I just wish they had posted the error along with everything else in the blog post. It would make things easier.

Apr 06 2015

Constraints and SELECT Statements

I’ve posted previously about how a foreign key constraint can change how a SELECT query behaves. Logically that just makes sense. But other types of constraints don’t affect execution plans do they?

Yes.

Let’s take this constraint as an example:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK 
ADD  CONSTRAINT CK_SalesOrderDetail_UnitPrice 
CHECK  ((UnitPrice>=(0.00)))

That will ensure that no values less than zero can slip in there. We can even validate it:

INSERT Sales.SalesOrderDetail
        (SalesOrderID,
         CarrierTrackingNumber,
         OrderQty,
         ProductID,
         SpecialOfferID,
         UnitPrice,
         UnitPriceDiscount,
         rowguid,
         ModifiedDate
        )
VALUES  (60176, -- SalesOrderID - int
         N'XYZ123', -- CarrierTrackingNumber - nvarchar(25)
         1, -- OrderQty - smallint
         873, -- ProductID - int
         1, -- SpecialOfferID - int
         -22, -- UnitPrice - money
         0.0, -- UnitPriceDiscount - money
         NEWID(), -- rowguid - uniqueidentifier
         GETDATE()  -- ModifiedDate - datetime
        );

Will give me an error:

Msg 547, Level 16, State 0, Line 470
The INSERT statement conflicted with the CHECK constraint “CK_SalesOrderDetail_UnitPrice”. The conflict occurred in database “AdventureWorks2014″, table “Sales.SalesOrderDetail”, column ‘UnitPrice’.

Let’s look at a SELECT query now. If we run this:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.';

The resulting execution plan looks like this:

Constraint

But, if I modify the query to look like this, adding an additional AND filter on the constrained UnitPrice column:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.'
        AND sod.UnitPrice > $0.0;

You know what happens to the execution plan? Nothing. It stays exactly the same. The optimizer knows that in order to satisfy the query, it can safely ignore the change in the WHERE clause. In fact, you can look at the SELECT operator properties for the two different plans and note that while the Query Hash values changes, the Plan Hash value stays the same. The plans are identical.

With that knowledge, I’m going to modify the query to look like this, reversing the UnitPrice reference to look for data that violates the constraint:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.'
        AND sod.UnitPrice < $0.0;

And now we have a new execution plan:

Constraint_scan

 

The optimizer recognized that there is no way that any data can be returned with the WHERE clause above because there is an enforced constraint (note the use of the WITH CHECK clause on the constraint). This completely changes the execution plan in every possible way. Now, instead of attempting to access the data, a Constant Scan operator is put in as a place holder for an empty result set.

To sum up, yes, constraints absolutely affect the choices made by the optimizer when those constraints would have an affect on the plan, even a SELECT query. Also, I would argue, this means that the use of enforced constraints can be a performance enhancer since the optimizer can make intelligent choices about how a given query is dealt with.

UPDATE: Fixed a typo that said the constraint prevented data that was equal to or less than zero. It’s only for data less than zero.

Mar 30 2015

Does the New Cardinality Estimator Reduce Bad Parameter Sniffing

parameter sniffing https://www.flickr.com/photos/nalbertini/6224914311/in/photolist-au5j9P-oovkHJ-j5hbfL-o2tQP4-5rjH9-o63z6j-e16cSr-o6eBtd-83UkyT-61apWt-xMWdz-6vUmcL-ojqHXt-krd1A8-bwvg6r-4wvJLh-pweDW5-5UzK52-NwvaB-4uQ6em-dZ7BLC-reoTH4-7RN6Bi-kr7ViJ-5BgntE-83a6sH-duXGC-6c3j6-Rphmr-4ceWRz-9pMBSV-4mz1HK-2kLC5w-6N2QGL-bcP2Qc-62i9cA-dPe47C-6ugVmj-59x1pS-6PYR8J-9QpqJK-83XraW-xJ3MS-mkE3Fq-5uJunA-oMMXp-rEX8bF-of5Fpr-3aJrZd-6WLBU7/No. Next question.

Although, that answer can be slightly, ever so slightly, nuanced…

Parameter sniffing is a good thing. But, like a good wine, parameter sniffing can go bad. It always comes down to your statistics. A very accurate set of statistics with very little data skew (some values that have radically more/less data than other values) and a very even distribution (most values have approximately similar cardinality), and parameter sniffing is your bestest buddy on the planet (next to a tested backup). But, introduce some data skew, let the stats get wildly out of date, or suffer from seriously uneven distribution, and suddenly your best friend is doing unspeakable things to your performance (kind of like multi-statement table valued user defined functions).

SQL Server 2014 has the first upgrade to the cardinality estimator since SQL Server 7. To get the best possible understanding of what that means and how it works, I can only recommend Joseph Sack’s white paper. It’s THE source of good information. All the new functionality within the estimator, for the most part, in most situations, will probably lead to more accurate row estimates. This means, in theory, less bad parameter sniffing. But, if you are benefiting from the old estimation engine, and in some cases you absolutely are, then you may see an increase in bad parameter sniffing. It comes right back around to your statistics and your data distribution.

So, does the new cardinality estimator lead to a reduction in bad parameter sniffing? It still mostly depends on your statistics. So, I can’t say, with certainty that the new cardinality estimator in SQL Server 2014 literally reduces bad parameter sniffing. In many instances, probably even most, the changes to the model are going to resort in better row estimates which means better choices by the optimizer. In some instances this will reverse and you may see pain, directly related to the changes in the cardinality estimator. But, overall, the fundamental behavior of parameter sniffing, again, a very positive thing, are unchanged. So, to a large degree, at least in my mind, no, the new cardinality estimator doesn’t reduce bad parameter sniffing because, most of the time, it’s due to the three causes mentioned before, out of date statistics, data skew and uneven data distribution. The new cardinality estimator doesn’t change these fundamentals, so, mostly, it won’t reduce bad parameter sniffing.

Mar 24 2015

Simple Parameterization and Data Types

Simple paramaterization occurs when the optimizer determines that a query would benefit from a reusable plan, so it takes the hard coded values and converts them to a parameter. Great stuff. But…

Let’s take this example. Here’s a very simple query:

SELECT ct.*
FROM    Person.ContactType AS ct
WHERE   ct.ContactTypeID = 7;

This query results in simple parameterization and we can see it in the SELECT operator of the execution plan:

param

We can also see the parameter that was defined in use in the predicate of the seek operation:

predicate

Hang on.

Who the heck put the wrong data type in there that’s causing an implicit conversion? The query optimizer did it. Yeah. Fun stuff. If I change the predicate value to 7000 or 700000 I’ll get two more plans and I can see them all by querying the cache. But, here’s a fun little bit. I just searched the XML of the plan, the data type isn’t stored anywhere that I can see. But, if you query the plan cache, look what the text of the query is:

(@1 tinyint)SELECT [ct].* 
FROM [Person].[ContactType] [ct] 
WHERE [ct].[ContactTypeID]=@1

Now, in this case, this type of conversion doesn’t hurt performance in any way. We’re still getting an index seek. I also ran some tests with strings. The optimizer was setting them to VARCHAR(8000) and I got an implicit conversion that, again, didn’t hurt performance. But, I wouldn’t be completely shocked, if, at some point, that lead to scans. What I find it really interesting is, that the data type isn’t stored with the plan. I fully expected it to be there.