Sep 06 2016

Query Store, Force Plan and “Better” Plans

I am endlessly fascinated by how the Query Store works. I love teaching it at every opportunity too. Plus, almost every time I teach it, I get a new question about the behavior that makes me delve into the Query Store just a little bit more, enabling me to better understand how it works. I received just such a question at SQLSaturday Norway:

If you are forcing a plan, and the physical structure changes such that a “better” plan is possible, what happens with plan forcing?

Let’s answer a different question first. What happens when the plan gets invalidated, when the index being used gets dropped or some other structural change occurs so that the plan is no longer valid? I answered that question in this blog post. The plan being forced, after the object is dropped, becomes invalid, so that plan can no longer be used. The Query Store still attempts to apply the plan during any recompile or compile event of the query in question, but it fails and a proper plan is used. All this means, I think, the Query Store is going to ignore the new index, since a new index doesn’t invalidate an existing plan. A new index just makes new plans possible. However, when I was asked this question, this wasn’t something I had tested, so I gave a speculative, best guess, answer with plenty of caveats and the promise to provide a tested answer ASAP. Here we go.

I’ll start with the same sample query:

SELECT  sit.Quantity,
        sit.TransactionOccurredWhen,
        i.InvoiceDate,
        si.StockItemName
FROM    Warehouse.StockItemTransactions AS sit
JOIN    Sales.Invoices AS i
        ON i.InvoiceID = sit.InvoiceID
JOIN    Warehouse.StockItems AS si
        ON si.StockItemID = sit.StockItemID
WHERE   sit.TransactionOccurredWhen BETWEEN '3/1/2015'
                                    AND     '3/5/2015';

The results are returned in about 53ms with 4850 reads and this execution plan:

PlanWithoutIndex

As you can see, there’s the suggestion of a possible missing index. We’ll apply that in a moment. First though, I’m going to get the plan and query identifiers from the Query Store:

SELECT  qsq.query_id,
        qsp.plan_id,
        CAST(qsp.query_plan AS XML)
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_query_text AS qsqt
        ON qsqt.query_text_id = qsq.query_text_id
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsqt.query_sql_text LIKE 'SELECT  sit.Quantity,
        sit.TransactionOccurredWhen,%';

With this information, I’ll use Force Plan to ensure this is the plan used going forward.

EXEC sys.sp_query_store_force_plan 42995,487;

With that done, I’ll create the index:

CREATE INDEX TransactionOccurredWhen
ON Warehouse.StockItemTransactions
(TransactionOccurredWhen)
INCLUDE (StockItemID,InvoiceID,Quantity);

When I do this same set of operations, run the query, identify a missing index, create a new index, rerun the query, when there is no plan forcing occurring, the execution plan above is replaced with one that uses the index and results in about 32ms execution time with 2942 reads, a significant improvement. You get a recompile event because the schema involved with the query has changed. With the change, a new index is available, so the recompile event uses that new index. What happens when you force the plan?

The recompile event after running CREATE INDEX still occurs. However, because we have elected to force a plan, that plan is what is used. In this instance, a recompile to a new plan would result in a faster query using fewer resources. However, as long as we’re forcing a plan and that plan stays valid, the plan will be forced.

In short, the behavior is exactly as I expected. Choosing to force a plan in the Query Store results in that plan being forced. While I think that the Query Store and plan forcing are wonderful new tools in our tool box, I am concerned that plan forcing will become far too easy a thing to implement. I worry that people will implement it without thinking through the implications and potential impacts.

It gets worse. If I change the query, let’s say I make it into a stored procedure and parameterize the query, and, instead of a very limited date range, I send in a whole month, the execution plan is quite different (with or without the index). Forcing the plan that is expecting less than 1,000 rows onto a query that is retrieving 10,000 rows results in pretty horrific performance. We really are going to have to be careful about using plan forcing appropriately because, as in so much of the rest of SQL Server, and in all of programming for that matter, the code is going to do exactly what we tell it do, whether that’s what we really want it to do or not.

Aug 15 2016

Query Store, Force Plan and Dropped Objects

I love the Query Store. Seriously. It’s a huge leap forward in the capabilities of Azure SQL Database and SQL Server in support of performance monitoring and query optimization. One of my favorite aspects of the Query Store is the ability to force plans. Frankly though, it’s also the scariest part of the Query Store. I do believe that plan forcing will be one of the most ill-used functions in SQL Server since the multi-statement table-valued user-defined function (don’t get me started). However, unlike the UDF, this ill-use will be because of poor understanding on the part of the user, not a fundamental design issue. No, plan forcing and the Query Store are very well constructed. Let me give you an example of just how well constructed they are.

Let’s imagine that have a situation such as bad parameter sniffing where you’ve determined that from the more than one possible execution plans against a table, there is a preferred plan. Enabling plan forcing to ensure that plan gets used is a no-brainer. Let’s further imagine that you have a junior DBA who is… let’s just say overly aggressive in their duties such that they do silly things occasionally. What happens when your pretty plan, which uses a particular index meets your junior DBA who just dropped that index?

Here’s the setup. We’re using the WideWorldImporters database and we have this query:

SELECT  *
FROM    Warehouse.StockItemTransactions AS sit
WHERE   sit.TransactionOccurredWhen BETWEEN '9/9/2015'
                                    AND     '9/11/2015';

This query, with the default configuration, will scan the existing table, so I’ll add an index:

CREATE INDEX TransactionOccurredWhenNCI
ON Warehouse.StockItemTransactions
(TransactionOccurredWhen);

For a limited range such as the one I’m passing above, I’ll get a plan with a key lookup operation which runs faster than the scan, so I’m happy. For a broader range, I’m likely to see a scan again, but since most of my queries have a very narrow range, I’d sure like to be able to force the plan to always compile to the seek and key lookup. To do this I need to find the query_id and plan_id from the Query Store (assuming I’m not using the GUI):

SELECT  qsp.plan_id,
        qsp.query_id,
		qsqt.query_sql_text,
		qsp.count_compiles
FROM    sys.query_store_plan AS qsp
JOIN    sys.query_store_query AS qsq
        ON qsq.query_id = qsp.query_id
JOIN    sys.query_store_query_text AS qsqt
        ON qsqt.query_text_id = qsq.query_text_id
WHERE   qsqt.query_sql_text LIKE 'SELECT  *
FROM    Warehouse.StockItemTransactions AS sit%';

With those values, I can force the execution plan so that it will always use the plan I want:

EXEC sys.sp_query_store_force_plan 42460,463;

That’s it. I’m happy because I’m going to see the execution plan used over and over, despite any values passed during a recompile.

Then…

Along comes our aggressive junior DBA who decides that there are “too many” indexes on the server. No, I don’t know what that means either, but they evidently read it on the internet or something so they drop the index we created before:

DROP INDEX TransactionOccurredWhenNCI ON Warehouse.StockItemTransactions;

What now happens to our lovely execution plan and the plan forcing? We’ll take a look at two events in Extended Events, sql_statement_recompile and query_store_plan_forcing_failed. Nothing happens immediately on dropping the index. The plans associated with that object, if any, are marked as invalid in the cache. The next time we call the query it’s going to recompile and we can see the event:

recompile_event

The most important part of the event is the recompile_cause which is “Schema changed”. However, I would also note the attach_activity_id.guid. I’ve chosen to enable causality tracking in this Extended Event session. This will cause all events associated with a common activity to get a GUID and then a sequence. This is interested because, after the recompile event, we get the query_store_plan_forcing_failed event:

plan_forcing_failed

The guid value is the same as the event above and the *.seq number is now 2, showing that, for these events, the recompile event occurred and then this event occurred. That makes perfect sense. The plan is marked for recompile, so, it’s going to be recompiled. I have enabled plan forcing though, so I have a particular plan that I want the optimizer to use. However, thanks to my “helpful” junior DBA, the plan is now invalid. You even get the description of what happened in the message field for the event:

Index ‘WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhenNCI’, specified in the USE PLAN hint, does not exist. Specify an existing index, or create an index with the specified name.

The first question now is, what happens with this query and the execution plan? Does the new plan generated now that the index is missing get stored in cache? Yes, it does. We can validate that by querying the cache, or, when capturing the actual execution plan, checking the “Retrieved from cache” property.

Because plan forcing is enabled, do we see a recompile every time this query is called? The answer to that question is slightly complex. Under normal circumstances, no. As long as that plan remains in cache, it’s simply reused. No other recompiles occur. A normal recompile event will cause another attempt at applying the invalid execution plan and we would see yet another query_store_plan_forcing_failed event for each recompile on the query. However, during testing, Joey D’Antoni (who was helping me play with this when we discussed what would happen when a plan was made invalid) had severe memory pressure on his server. He saw intermittent recompiles with a cause message that said plan forcing had failed. So if your server is under extreme stress and you cause this issue, you might see different messages. Just remember, the cause of the recompiles was not the plan forcing, but the memory pressure.

The fun thing is, as long as I don’t remove the plan forcing or take the query and plan out of the Query Store manually, if I recreate the index on my table with the same name and definition as that expected by the plan, the Query Store will simply reapply the plan and then successfully force it during any subsequent recompile situation. This is because Query Store is persisted with the database and barring outside activity, the information there will remain, just like the rest of the data in the database.

All of this means that Query Store works exactly the way we would expect, not forcing additional recompiles when you, or your junior DBA, inadvertently invalidate a plan. It also works as expected in that forcing a plan is stored with your database so that, assuming you don’t remove that plan from the Query Store, it will simply be reapplied after you fix the problem. It’s fun to see the thought that went behind the design of the behavior of Query Store. However, please, use plan forcing judiciously.

Jul 25 2016

Monitor Query Performance

Blog post #7 in support of Tim Ford’s (b|t) #iwanttohelp, #entrylevel. Read about it here.

Sooner or later when you’re working with SQL Server, someone is going to complain that the server is slow. I already pointed out the first place you should look when this comes up. But what if they’re more precise? What if, you know, or at least suspect, you have a problem with a query? How do you get information about how queries are behaving in SQL Server?

Choices For Query Metrics

It’s not enough to know that you have a slow query or queries. You need to know exactly how slow they are. You must measure. You need to know how long they take to run and you need to know how many resources are used while they run. You need to know these numbers in order to be able to determine if, after you do something to try to help the query, you’ll know whether or not you’ve improved performance. To measure the performance of queries, you have a number of choices. Each choice has positives and negatives associated with them. I’m going to run through my preferred mechanisms for measuring query performance and outline why. I’ll also list some of the other mechanisms you have available and tell you why I don’t like them. Let’s get started.

Dynamic Management Views

Since SQL Server 2005, Dynamic Management Views (DMV) and Functions (DMF) have been available for access all sorts of information about the server. Specifically there are a few DMVs that are focused on queries and query performance. If you go back through my blog, you can find tons of examples where I illustrate their use. You can also see them at work in commercial tools and free tools. Adam Machanic’s sp_WhoIsActive, a free tool, makes extensive use of DMVs. To learn more about DMVs, you can download a free book, Performance Tuning with SQL Server Dynamic Management Views. DMVs are available in Azure SQL Database, Azure SQL Data Warehouse, and all editions of SQL Server.

The information captured by DMVs is an aggregation of all the times the query has been run. This means you can’t find how long the query ran at 3PM yesterday. You can though see the minimum and maximum time the query took as well as the average. The ability to see this information is what makes DMVs useful. However, another important point about DMVs is that they only collect information while a query is in memory. As soon as it leaves the cache (the area of memory it is stored in), so does all the aggregated information about the query in the DMVs.

You use the DMVs for a general understanding of how a query is behaving. They’re not meant for detailed for long term collection of information about queries. For that we use other tools.

Extended Events

Introduced in SQL Server 2008, Extended Events (ExEvents) are a mechanism for capturing detailed information about SQL Server and the processes within. One of those processes is how queries behave. I have multiple examples on this blog on using ExEvents. You can’t go wrong reading about them on Jonathan Kehayias’ blog. Extended events are available in Azure SQL Database and all editions of SQL Server.

When you need to know every query against a database, or each time a particular query is called, and all the details associated with the query (reads, writes, duration), ExEvents are the way to go. ExEvents are very lightweight on the server (but not free) and can be filtered so that you capture just the information you need. The information is detailed and not aggregated. Instead it’s raw. The real issue with capturing this data is the amount of data you’ll be capturing. Testing and careful filtering to ensure you’re dealing with too much information is important. Prior to SQL Server 2012, there was no graphical user interface for reading ExEvent data, so you would have been forced to run queries against the XML that the information is captured within. With the tools available in SQL Server Management Studio, this is no longer the case.

You use ExEvents when you need specific and detailed information about a query. ExEvents are not so good for generalized monitoring.

Query Store

Introduced in Azure SQL Database, and first released in SQL Server with 2016, Query Store is another mechanism for capturing aggregated information about queries. As before, I have examples on how to work with Query Store on my blog. You can also find quite a bit on it over at Simple-Talk. Query Store is pretty specialized still and only available in Azure and SQL Server 2016, but it is in all editions of SQL Server 2016.

Query Store captures information similar to what is available in the DMVs. However, unlike the DMVs, the information that Query Store captures is kept around, even after a query ages out or is removed from cache. This persistence makes Query Store very exciting. You do have to choose to turn it on for each database you wish to capture queries for. It’s not automatic like DMVs. The capture processes are asynchronous, so they should be relatively light weight for most databases.

You use the Query Store when you need to capture query metrics over the long term, but you don’t need detailed information and aggregations works well for you.

Others

There are other ways to measure query performance. You can use the Profiler GUI, but that actually seriously negatively impacts the server. You can bring a server down by using it, so it should be avoided. Profiler generates scripts called trace, which can be used to monitor your server. However, they have a much higher impact than ExEvents and they’re on the deprecation list. Microsoft is not added new trace events for new functionality, so they’re becoming less and less useful with each release. You also can’t use trace against Azure. If you’re writing a query and you just want to see how long it takes to run, you can use SET STATISTICS TIME ON, to capture the execution time. This is a handy way to quickly measure performance. There is also the ability to capture reads and writes using SET STATISTICS IO ON, but, while this does capture the metrics we need, it adds considerable overhead to the query, skewing performance measurement. This is why I stick to ExEvents when I need an accurate measure.

Conclusion

Honest people can disagree about the best way to capture query performance. I have my preferences as you can see. However, I’m fairly certain that everyone would agree that it’s important to know how to capture performance metrics in order to be able to assert that performance has increased or decreased in a measured fashion. You don’t want to guess at query performance, you want to know.

Jul 18 2016

Common Table Expression, Just a Name

The Common Table Expression (CTE) is a great tool in T-SQL. The CTE provides a mechanism to define a query that can be easily reused over and over within another query. The CTE also provides a mechanism for recursion which, though a little dangerous and overused, is extremely handy for certain types of queries. However, the CTE has a very unfortunate name. Over and over I’ve had to walk people back from the “Table” in Common Table Expression. The CTE is just a query. It’s not a table. It’s not providing a temporary storage space like a table variable or a temporary table. It’s just a query. Think of it more like a temporary view, which is also just a query.

Every time I explain this, there are people who don’t believe me. They point to the “Table” in the name, “See. Says so right there. It’s a table.”

It’s not and I can prove it. Let’s create a relatively simple CTE and use it in a query:

WITH    MyCTE
          AS (SELECT    c.CustomerName,
                        cc.CustomerCategoryName
              FROM      Sales.Customers AS c
              JOIN      Sales.CustomerCategories AS cc
              ON        cc.CustomerCategoryID = c.CustomerCategoryID
              WHERE     c.CustomerCategoryID = 4)
    SELECT  *
    FROM    MyCTE;

Now, I’m going to run the query within the CTE and the CTE together as two statements in a batch and capture the execution plans:

ExecPlans

On the top, the CTE, on the bottom, the query. You’ll note that the execution plans are identical. They each have the exact same Query Plan Hash value in the properties, 0x88EFD2B7C165E667, even though they have different Query Hash values, 0x192FFC125A08CC35 and 0xFEB7F2BCAC853CD5, respectively. Further, if I capture the query metrics using extended events, I get identical reads and, on average, identical execution times:

duration

This is because, there is no table being created. The data is not treated differently. A CTE is just a query, not some type of temporary storage.

Heck, let’s do one more thing. Let’s use the latest SSMS plan comparison tool and highlight one of the operators to see what differences there are internally in the plan:

 

plancompare

I don’t see a lot of differences. In fact, I don’t see any. That’s because the optimizer recognizes these two queries as identical. If it was loading data into temporary storage, you would see differences in something. We don’t. This is because, despite the somewhat unfortunate emphasis that gets placed on the Table portion of the name, the emphasis of the name, Common Table Expression, should be on the word Expression.

I will point out an interesting difference, especially useful for those who plug in CTEs everywhere, whether it’s needed or not. Let’s look at the properties of the two plans:

peroperties

You can see the similarities and differences that I pointed out earlier in the Statement, Query Hash and Query Plan Hash, as well as the Estimated Subtree Cost and others. What’s truly interesting is that the CompileCPU, CompileMemory and CompileTime for the CTE is higher than the regular query. While the CTE is just a query, it’s a query that adds a non-zero overhead when used, and therefore, should only be used where appropriate (good gosh, I’ve seen people put it EVERWHERE, on every single query, don’t do that).

Hopefully, this is enough to establish, truly, completely, and thoroughly, that the Common Table Expression is an expression, not a table.

Yeah, I did this before, but it keeps coming up, so I tried a different approach. Let’s see if the word gets out. Your Common Table Expression is not a table.


I love talking about execution plans and query tuning. I’ll be doing this at an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

Don’t  miss your chance to an all day training course on execution plans before SQL Saturday Oslo in September.

Jul 11 2016

Same Query, Different Servers, Different Performance. Now What?

Based on the number of times I see this question on forums, it must be occurring all the time. You have two different servers that, as far as you know, are identical in terms of their options and setup (although not necessarily in terms of power, think a test or pre-production system versus production). On these servers you have a database on each that, as far as you know, is the same as the other in terms of options, objects, maybe even data (although, this does mean that you have unmasked production information in your QA environment, which potentially means you’re going to jail, might want to address this, especially now that I’ve told you about it, mens rea, you’re welcome). On each database you run, as far as you know, the exact same query (whether a straight up ad hoc query, stored procedure, parameterized query, whatever). On your production server, let’s call it PROD, the query runs for 3 hours before it returns, but on the test server, let’s call it TEST, it returns in about 3 seconds.

What. The. Hell.

This is especially troubling because PROD has 8 processors that are much more powerful than TEST, which has only 2 (either can go parallel, so this is the same). TEST only has 16GB of RAM and PROD has 64GB. Further, you know that the databases, data, and statistics are the same because you restored the TEST database from PROD (although, see that jail thing above). However, you’re on top of things. You capture the query metrics so you know exactly the amount of time and the reads or writes from each query and you even do this using extended events so you’re minimizing observer effects. You also have monitoring in place, so you know that there is no blocking in PROD. The query is just a pig there, but not on TEST. You’ve even gone so far as to get an estimated execution plan from each server and they are absolutely different.

Now what?

Well obviously something somewhere is different. Start by comparing everything on both servers and both databases down to… hang on, here, we’ll write a PowerShell script and then….

Wait, wait, wait!

You have the execution plans? Before we start digging through all the properties everywhere and comparing everything to everything, updating statistics 14 times, and all the rest, what if we look at the execution plans. They’re different, so we should start looking at scans & indexes & statistics &….

Wait!

Let’s start simple and clear. Let’s take a look at the properties of the first operator:

AllProperties

This contains a ton of information about the settings of the systems where each plan was generated. Let’s assume that, probably, the databases are the same, as is the data and the statistics, but the problem is a system or database setting. These differences can result in different execution plans and therefore different execution times. If we are dealing with two different servers and we are fairly sure the data, structure and the statistics are the same, the properties of the first operator are a great starting point for understanding what went wrong.

Oh, and the first operator is this one with the red square around it:

first operator

Let’s use the new SSMS Compare Showplan to see the differences between our two execution plans that we captured:

properties compared

(clicking on that will make it bigger)

There are a bunch of differences highlighted, but one ought to jump out pretty quick. That’s right, these two databases have different compatibility levels which resulted in one of them using the new optimizer and one using the old optimizer as evidenced by the CardinalityEstimatorModelVersion. Fun point, both have identical query hash values. Not surprising, but additional, quick, validation that we really are talking about the same query on each server (and one of them isn’t using a trace flag to change the cardinality estimator). You now have enough information to go and make some changes to your system without lots of further fumbling and work.

When dealing with the same query from two servers that ought to result in similar behavior, but doesn’t, get the execution plans (estimated plans are fine here) and compare the the properties of the first operator. That’s the quickest way to identify the issues that could be leading to the differences between the servers.


Want to talk more about execution plans and query tuning? Let’s do it.

In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m also going to be in Oslo Norway for a pre-con before SQL Saturday Oslo in September.

Jun 28 2016

CASE Statement in GROUP BY

Set based operations means you should put everything into a single statement, right?

Well, not really. People seem to think that having two queries is really bad, so when faced with logical gaps, they just cram them into the query they have. This is partly because SQL Server and T-SQL supports letting you do this, and it’s partly because it looks like a logical extension of code reuse to arrive at a query structure that supports multiple logic chains. However, let’s explore what happens when you do this on particular situation, a CASE statement in a GROUP BY clause.

You see this a lot because a given set of data may be needed in slightly different context by different groups within the company. Like many of my example queries, this could be better written. Like many of my example queries, it mirrors what I see in the wild (and for those following along at home, I’m using the WideWorldImporters database for tests now):

CREATE PROCEDURE dbo.InvoiceGrouping (@x INT)
AS
SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY CASE WHEN @x = 7 THEN i.ContactPersonID
              WHEN @x = 15 THEN i.AccountsPersonID
              ELSE i.SalespersonPersonID
         END;
GO

Running this for any given value above, 7, 15 or other, you’ll get the same execution plan, regardless of the column used in the GROUP BY. However, Parameter Sniffing is still something of a factor. When you group this data by SalesPersonID, you only get 10 rows back. This will be shown as the estimated number of rows returned if some value other than 7 or 15 is used as a parameter. However, this is always the plan:

coreplan

You can click on that to expand it into something readable. We can eliminate the Parameter Sniffing from the equation if we want to by modifying the query thus:

CREATE PROCEDURE dbo.InvoiceGrouping_NoSniff (@x INT)
AS
DECLARE @x2 INT;
SET @x2 = @x;

SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY CASE WHEN @x2 = 7 THEN i.ContactPersonID
              WHEN @x2 = 15 THEN i.AccountsPersonID
              ELSE i.SalespersonPersonID
         END;
GO

However, except for some deviation on the estimated rows (since it’s averaging the rows returned), the execution plan is the same.

What’s the big deal right? Well, let’s break down the code into three different procedures:

CREATE PROCEDURE dbo.InvoiceGrouping_Contact
AS
SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY i.ContactPersonID;
GO

CREATE PROCEDURE dbo.InvoiceGrouping_Sales
AS
SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY i.SalespersonPersonID;
GO

CREATE PROCEDURE dbo.InvoiceGrouping_Account
AS
SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY i.AccountsPersonID;
GO

Interestingly enough, these three queries produce a nearly identical execution plan. The one big difference is the Compute Scalar operator that is used to generate a value for the Hash Match Aggregate is no longer in the query:

specificplan

The same basic set of structures, scans against both tables, to arrive at the data. Cost estimates between the two plans are very different though, with the targeted queries having a much lower estimated cost.

Performance-wise, interestingly enough, the average execution time of the first query, only returning the 10 rows, is 157ms on average, while the query grouping directly on the SalesPersonID averages about 190ms. Now, the reads tell a slightly different story with 17428 on the generic query and 5721 on the specific query. So, maybe a server under load will see a significant performance increase. However, let’s deal with what we have in front of us and say that, at least for these tests, the catch-all GROUP BY query performs well.

Now let’s change the paradigm slightly. Let’s add an index:

CREATE INDEX TestingGroupBy ON Sales.Invoices (SalespersonPersonID);

Frankly, this isn’t a very useful index. However, after adding it, the execution plan for the InvoiceGrouping_Sales query changes. Instead of scanning the table, it’s now scanning the index. Despite recompiles and attempts to force it using hints, the original InvoiceGrouping query will not use this index. Duration of the InvoiceGrouping_Sales query drops to 140ms on average and the reads drop a little further to 5021. Getting an 11% increase on performance is a win.

This is a pretty simplified example, however, making the CASE statement more complex won’t improve performance or further assist the optimizer to make good choices. Instead of trying to cram multiple different logical groupings into a single query, a better approach would be to create the three new procedures that I did above, and make the original InvoiceGrouping procedure into a wrapping procedure that chooses which of the individual procedures to call. This way, if you do add indexes in support of each of the different possible groupings, you would realize a positive outcome in your performance.


Want to talk more about execution plans and query tuning?. In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m also going to be in Oslo Norway for a pre-con before SQL Saturday Oslo in September.

May 31 2016

Precedence Goes to Query Store or Plan Guide?

While presenting at SQLDay in Wroclaw, Poland, on the Query Store, I was asked a pretty simple question, which takes precedence, the Query Store or a Plan Guide?

One of my favorite answers to questions is “I don’t know” because it gives me the opportunity to learn. Let’s figure this one out together.

I’ll post the code to recreate this experiment within AdventureWorks at the end of the article. I’m doing this because the code for forcing execution plans using Plan Guides can be pretty doggone long (you may need to generate your own XML from a plan on your own system, fair warning).

I have a stored procedure that I use frequently to demonstrate parameter sniffing and bad parameter sniffing, AddressByCity (listed below). This query when passed the value ‘London’ will generate a plan with a Merge Join. When passed the value ‘Mentor’ it will generate a plan with a Loops Join. These different plans are caused by the statistics for the different values. 434 rows for ‘London’ and 1 row for ‘Mentor’. To start the test, I’ll generate a plan for each value, dropping the plan from cache between executions, so that I have both plans in the Query Store. Now, we’re ready for phase 1.

Phase 1

I’m going to remove the current plan from cache. I will then use sp_query_store_force_plan to select the plan with the Loops join. I’ll run the query using ‘London’ to validate that the plan generated is NOT the plan for ‘London’ but rather the plan for ‘Mentor’. Done, and I can see the Use Plan results in the Properties of the Select operator:

QSvPG_UsePlan

I’ve highlighted a couple of interesting points. First, even though we are using a plan that is compiled for the value of ‘Mentor’, because this plan was compiled using the value ‘London’, that’s what gets stored, despite the plan used. You can then see down at the bottom, that the ‘Use plan’ property is set to true. Now, let’s force the plan using a plan guide (in my script below). To be very sure what happens, I’ll again clear the plan from cache.

If the Query Store wins, I should see a plan with a Loops Join. If the Plan Guide wins, I should see a plan with a Merge Join. The moment of truth:

QSvPG_Plan

Query Store wins! Let’s see how it shows in the Properties:

QSvPG_PlanGuideName

You’ll see that the Parameter Compile & Runtime values are the same and that the ‘Use plan’ property is the same. Note however, the PlanGuideName property is in play. In fact, the indication is that the optimizer thinks that a Plan Guide has been successfully applied. In fact, if I watch the Extended Events for the plan_guide_successful event, the best way to check how Plan Guides are being used, apart from looking at the execution plans, it shows success for the Plan Guide we created.

Phase 2

Precedence could matter. First, I remove the Plan Guide using sp_control_plan_guide. Then, I remove the Query Store force using sp_query_store_unforce_plan. I’ll also clear the plan out of cache again, just to be sure. Next, I create the Plan Guide. I’ll run the procedure using the value ‘Mentor’ to ensure that I see the Merge/’London’ plan, the PlanGuideName property in that plan, and a successful application of a Plan Guide from the plan_guide_successful ExEvent. All good.

Now, I once again force the plan from Query Store. Rerun the plan…

The winner is Query Store.

Phase 3

OK, so, if we try to force plans using Plan Guide, Query Store wins. What about if I’m just trying to apply a hint? Let’s clear everything up again and then create an OPTIMIZE FOR query hint in a Plan Guide. Simple right?

Query Store wins again… Except things get a little odd. Take a look at the Properties now:

QSvPG_Parameter

Normally, with a Plan Guide that includes an OPTIMIZE FOR hint, I’d see the Compiled Value as the defined value within the hint (in this case ‘London’). However, this time I’m seeing just the value of ‘Mentor’.

Conclusion

I don’t know what’s occurring in the internals. I just know the evidence I have in front of me. First, Query Store vs. Plan Guides? Query Store wins. However, internally, something is still going on. The Plan Guide is still successfully applied, at least according to what I can see. The processes that determine if a Plan Guide should be applied or not are not interfered with by the Query Store plan forcing. The fact that I’m seeing a Query Store plan and evidence of a successful Plan Guide though, that’s a problem that could cause issues when troubleshooting. Now you know, and knowing, as they say, is half the battle.


Let’s get together and talk about execution plans. In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m extremely exited to be in Oslo Norway for a pre-con before SQL Saturday Oslo in September.


/*NOTE: These scripts are not in the exact order in which
they must be run to re-create all my tests. These are
however, all the scripts you'll need.
*/
--the procedure
CREATE PROC dbo.AddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;
GO

--generating the first plan
EXEC dbo.AddressByCity
    @City = N'London';



--remove this from cache
DECLARE @PlanHandle VARBINARY(64);

SELECT  @PlanHandle = deps.plan_handle
FROM    sys.dm_exec_procedure_stats AS deps
WHERE   deps.object_id = OBJECT_ID('dbo.AddressByCity');

IF @PlanHandle IS NOT NULL
    BEGIN
        DBCC FREEPROCCACHE(@PlanHandle);
    END
GO



--generate the second plan
EXEC dbo.AddressByCity
    @City = N'Mentor';




--get the query & plan id
SELECT  qsq.query_id,
        qsp.plan_id
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsq.object_id = OBJECT_ID('dbo.AddressByCity');


--force the plan
EXEC sys.sp_query_store_force_plan 414,460;

--unforce the plan
EXEC sys.sp_query_store_unforce_plan 414,460;




--Get the plan_handle and offset 
SELECT  deqs.plan_handle,
        deqs.statement_start_offset
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE   dest.objectid = OBJECT_ID('dbo.AddressByCity');

--create a plan guide
EXEC sys.sp_create_plan_guide_from_handle
    @name = N'PlanGuidePrecedence',
    @plan_handle = 0x05000500C26A7610A03A4F53A600000001000000000000000000000000000000000000000000000000000000,
    @statement_start_offset = 108;

--remove plan guide
EXEC sys.sp_control_plan_guide
    @operation = N'DROP',
    @name = N'PlanGuidePrecedence';


--pull a plan out of cache. You'll need to make sure the right plan is there first
DECLARE @xmlplan NVARCHAR(MAX)

SELECT  @xmlplan = deqp.query_plan
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_text_query_plan(deqs.plan_handle,DEFAULT,DEFAULT) AS deqp
WHERE   deqp.objectid = OBJECT_ID('dbo.AddressByCity');

SELECT @xmlplan


--copied and pasted the value from @xmlplan for easy reuse
DECLARE @xmlplan NVARCHAR(MAX) =
'<ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.5" Build="13.0.1300.275"><BatchSequence><Batch><Statements><StmtSimple StatementText="CREATE PROC dbo.AddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City" StatementId="1" StatementCompId="3" StatementType="SELECT" StatementSqlHandle="0x0900D56A5A43E8B56B541DFF8DC13F149C6A0000000000000000000000000000000000000000000000000000" DatabaseContextSettingsId="1" ParentObjectId="276196034" StatementParameterizationType="0" RetrievedFromCache="true" StatementSubTreeCost="0.316799" StatementEstRows="434" SecurityPolicyApplied="false" StatementOptmLevel="FULL" QueryHash="0xDD75E124763781F2" QueryPlanHash="0x6C408B64C5E8A83F" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="120"><StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false"></StatementSetOptions><QueryPlan CachedPlanSize="32" CompileTime="3" CompileCPU="3" CompileMemory="360"><MissingIndexes><MissingIndexGroup Impact="89.7983"><MissingIndex Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]"><ColumnGroup Usage="EQUALITY"><Column Name="[City]" ColumnId="4"></Column></ColumnGroup></MissingIndex></MissingIndexGroup></MissingIndexes><MemoryGrantInfo SerialRequiredMemory="512" SerialDesiredMemory="720"></MemoryGrantInfo><OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="260900" EstimatedPagesCached="65225" EstimatedAvailableDegreeOfParallelism="2"></OptimizerHardwareDependentProperties><RelOp NodeId="0" PhysicalOp="Merge Join" LogicalOp="Inner Join" EstimateRows="434" EstimateIO="0" EstimateCPU="0.0069349" AvgRowSize="225" EstimatedTotalSubtreeCost="0.316799" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"><OutputList><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine1"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine2"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="PostalCode"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="Name"></ColumnReference></OutputList><Merge ManyToMany="0"><InnerSideJoinColumns><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference></InnerSideJoinColumns><OuterSideJoinColumns><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="StateProvinceID"></ColumnReference></OuterSideJoinColumns><Residual><ScalarOperator ScalarString="[AdventureWorks2014].[Person].[StateProvince].[StateProvinceID] as [sp].[StateProvinceID]=[AdventureWorks2014].[Person].[Address].[StateProvinceID] as [a].[StateProvinceID]"><Compare CompareOp="EQ"><ScalarOperator><Identifier><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="StateProvinceID"></ColumnReference></Identifier></ScalarOperator><ScalarOperator><Identifier><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference></Identifier></ScalarOperator></Compare></ScalarOperator></Residual><RelOp NodeId="1" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan" EstimateRows="181" EstimateIO="0.00386574" EstimateCPU="0.0003561" AvgRowSize="65" EstimatedTotalSubtreeCost="0.00422184" TableCardinality="181" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"><OutputList><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="StateProvinceID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="Name"></ColumnReference></OutputList><IndexScan Ordered="1" ScanDirection="FORWARD" ForcedIndex="0" ForceSeek="0" ForceScan="0" NoExpandHint="0" Storage="RowStore"><DefinedValues><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="StateProvinceID"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="Name"></ColumnReference></DefinedValue></DefinedValues><Object Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Index="[PK_StateProvince_StateProvinceID]" Alias="[sp]" IndexKind="Clustered" Storage="RowStore"></Object></IndexScan></RelOp><RelOp NodeId="2" PhysicalOp="Sort" LogicalOp="Sort" EstimateRows="434" EstimateIO="0.0112613" EstimateCPU="0.0060321" AvgRowSize="177" EstimatedTotalSubtreeCost="0.30564" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"><OutputList><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine1"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine2"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="PostalCode"></ColumnReference></OutputList><MemoryFractions Input="1" Output="1"></MemoryFractions><Sort Distinct="0"><OrderBy><OrderByColumn Ascending="1"><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference></OrderByColumn></OrderBy><RelOp NodeId="3" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan" EstimateRows="434" EstimateIO="0.257199" EstimateCPU="0.0217324" AvgRowSize="177" EstimatedTotalSubtreeCost="0.278931" TableCardinality="19614" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"><OutputList><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine1"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine2"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="PostalCode"></ColumnReference></OutputList><IndexScan Ordered="0" ForcedIndex="0" ForceScan="0" NoExpandHint="0" Storage="RowStore"><DefinedValues><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressID"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine1"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine2"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="PostalCode"></ColumnReference></DefinedValue></DefinedValues><Object Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Index="[PK_Address_AddressID]" Alias="[a]" IndexKind="Clustered" Storage="RowStore"></Object><Predicate><ScalarOperator ScalarString="[AdventureWorks2014].[Person].[Address].[City] as [a].[City]=[@City]"><Compare CompareOp="EQ"><ScalarOperator><Identifier><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference></Identifier></ScalarOperator><ScalarOperator><Identifier><ColumnReference Column="@City"></ColumnReference></Identifier></ScalarOperator></Compare></ScalarOperator></Predicate></IndexScan></RelOp></Sort></RelOp></Merge></RelOp><ParameterList><ColumnReference Column="@City" ParameterCompiledValue="N'London'"></ColumnReference></ParameterList></QueryPlan></StmtSimple></Statements></Batch></BatchSequence></ShowPlanXML>'


EXEC sys.sp_create_plan_guide
    @name = N'PlanGuidePrecedence',
    @stmt = N'SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;',
    @type = N'OBJECT',
    @module_or_batch = N'dbo.AddressByCity',
    @params = NULL,
    @hints = @xmlplan;


--create a plan guide with just a hint in it
EXEC sys.sp_create_plan_guide @name = 'PlanGuidePrecedence', -- sysname
    @stmt = N'SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;', -- nvarchar(max)
    @type = N'Object', -- nvarchar(60)
    @module_or_batch = N'dbo.AddressByCity', -- nvarchar(max)
    @params = NULL, -- nvarchar(max)
    @hints = N'OPTION(OPTIMIZE FOR(@City = ''London''))'; -- nvarchar(max)
May 16 2016

Query Store, Forced Plans, and New Plans

I love questions. I recently received one about new plans in the Query Store (available in Azure SQL Database now and in SQL Server 2016 after June 1).

Let’s say you have selected a plan that you want to force. You set it up. Now, let’s say the plan ages out of cache or even goes through a recompile. During the recompile, due to out of date statistics or skew in the statistics, you would, under normal circumstances, get a new plan. However, with Query Store and plan forcing, the plan that’s going to be used is the plan that is being forced. But, does that other plan, the one not used, get stored in Query Store?

I have no idea. Let’s find out.

The Setup

To start with, a small stored procedure that I use all the time for bad parameter sniffing demos that reliably gets different plans with different values due to statistics skew:

CREATE PROC dbo.spAddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;

If this procedure is called for a value of ‘London’ it gets a plan with a Merge Join. For most other value it gets a plan with a Loops Join. Here’s an example of the ‘London’ plan:

2016-02-22_10-38-43

I’ve run both queries on my test system. I can check their existence in Query Store this way:

SELECT  CAST(qsp.query_plan AS XML),
		qsq.query_id,
		qsp.plan_id,
		qsp.is_forced_plan
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsq.object_id = OBJECT_ID('dbo.spAddressByCity');

That’s going to return two (2) rows:

newplanresults

Now, I’m going to remove the second plan, the ‘Mentor’ plan, and I’m going to force the ‘London’ plan with two statements:

EXEC sys.sp_query_store_remove_plan @plan_id =219;

EXEC sys.sp_query_store_force_plan 2,2;

Now, when I execute the query, no matter what values I pass, I’m going to arrive at the ‘London’ plan. However, let’s pull the plan from cache in order to force a recompile:

DECLARE @PlanHandle varbinary(64);

SELECT  @PlanHandle = deps.plan_handle
FROM    sys.dm_exec_procedure_stats AS deps
WHERE   deps.object_id = OBJECT_ID('dbo.spAddressByCity');

DBCC FREEPROCCACHE(@PlanHandle);

The Reveal

When I execute the query using the value of ‘Mentor’ do I get that plan stored in the Query Store? I know which plan will be in the cache. The ‘London’ plan because of I have chosen to force that plan. But, when I query the Query Store again to see what’s there for my stored procedure:

newplannegativeresults

The short answer is, nope. Any possible new plans that could have been stored in Query Store are not there. Yes, we have gone into the compile process, but, due to plan forcing, that process is getting a small bypass and the plan is simply applied from the Query Store. This, even though, you’re going to see a compile or recompile event if you capture them using Extended Events. For more fun on recompiles and the Query Store, see this earlier post of mine.

Keep those questions coming.


I love talking about query tuning. If you want to spend the day with me talking query tuning, you sure can. In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’ll be doing a pre-conference seminar in Oslo Norway before SQL Saturday Oslo in September.

May 03 2016

Implicit Conversion and Performance

Letting SQL Server change data types automatically can seriously impact performance in a negative way. Because a calculation has to be run on each column, you can’t get an index seek. Instead, you’re forced to use a scan. I can demonstrate this pretty simply. Here’s a script that sets up a test table with three columns and three indexes and tosses a couple of rows in:

CREATE TABLE dbo.ConvertTest (
     BigIntColumn BIGINT NOT NULL,
     IntColumn INT NOT NULL,
     DateColumn VARCHAR(30)
    );

CREATE INDEX BigIntIndex 
ON dbo.ConvertTest 
(BigIntColumn);
CREATE INDEX IntIndex 
ON dbo.ConvertTest 
(IntColumn);
CREATE INDEX DateIndex 
ON dbo.ConvertTest
(DateColumn);

WITH    Nums
          AS (SELECT TOP (1000000)
                        ROW_NUMBER() OVER (ORDER BY (SELECT 1
                                                    )) AS n
              FROM      master.sys.all_columns ac1
              CROSS JOIN master.sys.all_columns ac2
             )
    INSERT  INTO dbo.ConvertTest
            (BigIntColumn,
             IntColumn,
             DateColumn
            )
    SELECT  Nums.n,
            Nums.n,
            DATEADD(HOUR, Nums.n, '1/1/1900')
    FROM    Nums;

If I run a very simple query like this:

DECLARE @param DATETIME;

SET @param = '3/15/1963 8:00:00';

SELECT  ct.DateColumn
FROM    dbo.ConvertTest AS ct
WHERE   ct.DateColumn = @param;

I get an execution plan that looks like this:

convert_scan

The issue is, while I put dates into the DateColumn, the data type there is VARCHAR, not DATETIME. Passing a DATETIME value results in an implicit conversion that we can see in the Index Scan Predicate property:

ConverImplicitHurtsPerformance

This is a function against a column and that is what results in the scan. If I modify the code as follows:

DECLARE @param VARCHAR(30);

SET @param = '3/15/1963 8:00:00';

SELECT  ct.DateColumn
FROM    dbo.ConvertTest AS ct
WHERE   ct.DateColumn = @param;

I get an execution plan that looks like this:

convert_seek

No change in data type is needed, so the index can be used. It’s a very simple demonstration of how implicit conversions hurt performance. Now, what about the integer columns? What happens when I do this with the code:

DECLARE @param INT;

SET @param = 650323;

SELECT  ct.BigIntColumn
FROM    dbo.ConvertTest AS ct
WHERE   ct.BigIntColumn = @param;

Here’s the execution plan:

convert_intseek

And here is the predicate:

convert_predicate

In this case, we’re still getting an implicit conversion of the data type, but SQL Server is smart enough to realize that, hey, it’s still just an integer, so it can convert the @param instead of the column, resulting in a seek. If I change the test so that it runs a query against the INT column passing a BIGINT value, it doesn’t even do a conversion of any kind.

In short, not all implicit conversions are a problem. However, it’s still a best practice to use the right data type for columns. You should also use the same data type for your parameters and local variables as the column data type. Taking these steps avoids issues with implicit conversion.


For lots more information on query tuning, in August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m also going to be doing an all day training course before SQL Saturday Oslo in September.

Feb 24 2016

Query Store and Recompile

One of the many advantages of SQL Cruise is the ability to have enough time during a presentation to be able to answer questions from the people there in great detail. One question came up while I was showing the new functionality of Query Store (available soon in SQL Server 2016, available right now in Azure SQL Database).

What happens to plan forcing when you have OPTION RECOMPILE on a query?

Great question. I have a favorite procedure I use to illustrate the functionality of parameter sniffing:

ALTER PROC dbo.AddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;

If this procedure is called with the value of ‘Mentor’ you get an execution plan that looks like this:

2016-02-22_10-48-00

If you remove that plan from cache and then call the procedure with the value of ‘London’ then the plan looks like this:

2016-02-22_10-38-43

It’s parameter sniffing in action. One of the cool pieces of functionality that comes with the Query Store is the ability to choose a plan and make that plan get used, regardless of the parameter values. You do this through the following command:

sys.sp_query_store_force_plan

You pass it a query_id and a plan_id  from the information in the Query Store and now that plan will get used. But, what happens when you have attempted to fix your query using a query hint, such as RECOMPILE? Let’s modify our procedure:

ALTER PROC dbo.spAddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City
OPTION  (RECOMPILE);

If I run the procedure, regardless of the parameter values passed in, I get a new plan for each value. Nothing is stored in cache, but, it is stored in the Query Store.

Let’s pull the query_id and plan_id out of Query Store for this procedure:

SELECT  qsq.query_id,
        qsp.plan_id,
        CAST(qsp.query_plan AS XML) AS sqlplan
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsq.object_id = OBJECT_ID('dbo.spAddressByCity');

Currently, on my system, this brings back four rows, two distinct query_id and four different plan_id. This is because, when I modified the text of the query within the procedure, I kept the object_id the same, but, the query text is different (the RECOMPILE hint). This means a different record in the Query Store. We’ll use the newer query_id and pick one of the plans to force it:

EXEC sys.sp_query_store_force_plan
    42,
    44;

Now, no matter what value I pass it, the RECOMPILE hint is effectively ignored. I have the same plan every time (in my case, the London plan). Ah, but is the RECOMPILE hint ignored? In fact, no, it isn’t. If I capture the extended event sql_statement_recompile, I can see every single time I execute this procedure that it’s going through a full recompile… which is then tossed and the plan I chose is forced upon the optimizer. Every time.

This isn’t a revelation. It makes sense. However, the Query Store represents one more thing that we have to think through.


If you want to talk query tuning, execution plans, the Query Store, and more, I’ll be presenting an all day pre-conference seminar at SQLDay in Wroclaw Poland on May 16th, 2016. Let’s talk.