Sep 06 2016

Query Store, Force Plan and “Better” Plans

I am endlessly fascinated by how the Query Store works. I love teaching it at every opportunity too. Plus, almost every time I teach it, I get a new question about the behavior that makes me delve into the Query Store just a little bit more, enabling me to better understand how it works. I received just such a question at SQLSaturday Norway:

If you are forcing a plan, and the physical structure changes such that a “better” plan is possible, what happens with plan forcing?

Let’s answer a different question first. What happens when the plan gets invalidated, when the index being used gets dropped or some other structural change occurs so that the plan is no longer valid? I answered that question in this blog post. The plan being forced, after the object is dropped, becomes invalid, so that plan can no longer be used. The Query Store still attempts to apply the plan during any recompile or compile event of the query in question, but it fails and a proper plan is used. All this means, I think, the Query Store is going to ignore the new index, since a new index doesn’t invalidate an existing plan. A new index just makes new plans possible. However, when I was asked this question, this wasn’t something I had tested, so I gave a speculative, best guess, answer with plenty of caveats and the promise to provide a tested answer ASAP. Here we go.

I’ll start with the same sample query:

SELECT  sit.Quantity,
        sit.TransactionOccurredWhen,
        i.InvoiceDate,
        si.StockItemName
FROM    Warehouse.StockItemTransactions AS sit
JOIN    Sales.Invoices AS i
        ON i.InvoiceID = sit.InvoiceID
JOIN    Warehouse.StockItems AS si
        ON si.StockItemID = sit.StockItemID
WHERE   sit.TransactionOccurredWhen BETWEEN '3/1/2015'
                                    AND     '3/5/2015';

The results are returned in about 53ms with 4850 reads and this execution plan:

PlanWithoutIndex

As you can see, there’s the suggestion of a possible missing index. We’ll apply that in a moment. First though, I’m going to get the plan and query identifiers from the Query Store:

SELECT  qsq.query_id,
        qsp.plan_id,
        CAST(qsp.query_plan AS XML)
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_query_text AS qsqt
        ON qsqt.query_text_id = qsq.query_text_id
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsqt.query_sql_text LIKE 'SELECT  sit.Quantity,
        sit.TransactionOccurredWhen,%';

With this information, I’ll use Force Plan to ensure this is the plan used going forward.

EXEC sys.sp_query_store_force_plan 42995,487;

With that done, I’ll create the index:

CREATE INDEX TransactionOccurredWhen
ON Warehouse.StockItemTransactions
(TransactionOccurredWhen)
INCLUDE (StockItemID,InvoiceID,Quantity);

When I do this same set of operations, run the query, identify a missing index, create a new index, rerun the query, when there is no plan forcing occurring, the execution plan above is replaced with one that uses the index and results in about 32ms execution time with 2942 reads, a significant improvement. You get a recompile event because the schema involved with the query has changed. With the change, a new index is available, so the recompile event uses that new index. What happens when you force the plan?

The recompile event after running CREATE INDEX still occurs. However, because we have elected to force a plan, that plan is what is used. In this instance, a recompile to a new plan would result in a faster query using fewer resources. However, as long as we’re forcing a plan and that plan stays valid, the plan will be forced.

In short, the behavior is exactly as I expected. Choosing to force a plan in the Query Store results in that plan being forced. While I think that the Query Store and plan forcing are wonderful new tools in our tool box, I am concerned that plan forcing will become far too easy a thing to implement. I worry that people will implement it without thinking through the implications and potential impacts.

It gets worse. If I change the query, let’s say I make it into a stored procedure and parameterize the query, and, instead of a very limited date range, I send in a whole month, the execution plan is quite different (with or without the index). Forcing the plan that is expecting less than 1,000 rows onto a query that is retrieving 10,000 rows results in pretty horrific performance. We really are going to have to be careful about using plan forcing appropriately because, as in so much of the rest of SQL Server, and in all of programming for that matter, the code is going to do exactly what we tell it do, whether that’s what we really want it to do or not.

Aug 29 2016

Azure SQL Data Warehouse Execution Plans

Azure SQL Data Warehouse can sometimes feel like it’s completely different from SQL Server, but under the covers, it is still (mostly) SQL Server and it does have execution plans. Let’s take a look at one.

I’ve created a new SQL Data Warehouse using the sample database available on the portal, AdventureWorksDW. Here’s a query against that database:

SELECT dd.FullDateAlternateKey AS OrderDate,
dc.LastName,
SUM(fis.SalesAmount) AS SumSalesAmount
FROM dbo.FactInternetSales AS fis
JOIN dbo.DimDate AS dd
ON fis.OrderDateKey = dd.DateKey
JOIN dbo.DimCustomer AS dc
ON dc.CustomerKey = fis.CustomerKey
GROUP BY dd.FullDateAlternateKey,
dc.LastName
HAVING SUM(fis.SalesAmount) > 5000.0
ORDER BY OrderDate DESC;

If I attempt to capture an execution plan using the SQL Server Management Studio GUI, nothing happens. If I try to use T-SQL commands, I get an error that those commands are not supported with this version of SQL Server. Same thing if I try to capture a plan using Visual Studio. So… now what? Enter our new command:

EXPLAIN
SELECT  dd.FullDateAlternateKey AS OrderDate,
        dc.LastName,
        SUM(fis.SalesAmount) AS SumSalesAmount
FROM    dbo.FactInternetSales AS fis
JOIN    dbo.DimDate AS dd
        ON fis.OrderDateKey = dd.DateKey
JOIN    dbo.DimCustomer AS dc
        ON dc.CustomerKey = fis.CustomerKey
GROUP BY dd.FullDateAlternateKey,
        dc.LastName
HAVING  SUM(fis.SalesAmount) > 5000.0
ORDER BY OrderDate DESC;

If I run this through the SSMS query window, I get a syntax error. So we’re now in Visual Studio. This is how we generate an execution plan from within Azure SQL Data Warehouse. What you get is XML output in the results like this:

xmlresults

According to the documentation on EXPLAIN, I should be able to click on the XML and it will open up to explore. In my version of Visual Studio (2015), I didn’t find that to be the case. Instead I had to copy and paste the XML into an XML file window that I created within Visual Studio. This is what I finished with:

<?xml version="1.0" encoding="utf-8"?>
<dsql_query number_nodes="1" number_distributions="60" number_distributions_per_node="60">
  <sql>SELECT  dd.FullDateAlternateKey AS OrderDate,          dc.LastName,          SUM(fis.SalesAmount) AS SumSalesAmount  FROM    dbo.FactInternetSales AS fis  JOIN    dbo.DimDate AS dd          ON fis.OrderDateKey = dd.DateKey  JOIN    dbo.DimCustomer AS dc          ON dc.CustomerKey = fis.CustomerKey  GROUP BY dd.FullDateAlternateKey,          dc.LastName  HAVING  SUM(fis.SalesAmount) &gt; 5000.0  ORDER BY OrderDate DESC</sql>
  <dsql_operations total_cost="5.98868068474576" total_number_operations="13">
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_14</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>
        <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_14] ([DateKey] INT NOT NULL, [FullDateAlternateKey] DATE NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="BROADCAST_MOVE">
      <operation_cost cost="1.99584" accumulative_cost="1.99584" average_rowsize="7" output_rows="1188" GroupNumber="12" />
      <source_statement>SELECT [T1_1].[DateKey] AS [DateKey],         [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey]  FROM   [DWTest].[dbo].[DimDate] AS T1_1</source_statement>
      <destination_table>[TEMP_ID_14]</destination_table>
    </dsql_operation>
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_15</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllDistributions" />
      <sql_operations>
        <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_15] ([OrderDateKey] INT NOT NULL, [CustomerKey] INT NOT NULL, [SalesAmount] MONEY NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="SHUFFLE_MOVE">
      <operation_cost cost="3.93098847457627" accumulative_cost="5.92682847457627" average_rowsize="16" output_rows="60398" GroupNumber="10" />
      <source_statement>SELECT [T1_1].[OrderDateKey] AS [OrderDateKey],         [T1_1].[CustomerKey] AS [CustomerKey],         [T1_1].[SalesAmount] AS [SalesAmount]  FROM   [DWTest].[dbo].[FactInternetSales] AS T1_1</source_statement>
      <destination_table>[TEMP_ID_15]</destination_table>
      <shuffle_columns>CustomerKey;</shuffle_columns>
    </dsql_operation>
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_16</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllDistributions" />
      <sql_operations>
        <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_16] ([FullDateAlternateKey] DATE NOT NULL, [LastName] NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS, [col] MONEY NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="SHUFFLE_MOVE">
      <operation_cost cost="0.0618522101694915" accumulative_cost="5.98868068474576" average_rowsize="111" output_rows="136.985" GroupNumber="25" />
      <source_statement>SELECT [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey],         [T1_1].[LastName] AS [LastName],         [T1_1].[col] AS [col]  FROM   (SELECT   SUM([T2_2].[SalesAmount]) AS [col],                   [T2_1].[FullDateAlternateKey] AS [FullDateAlternateKey],                   [T2_2].[LastName] AS [LastName]          FROM     [tempdb].[dbo].[TEMP_ID_14] AS T2_1                   INNER JOIN                   (SELECT [T3_2].[OrderDateKey] AS [OrderDateKey],                           [T3_2].[SalesAmount] AS [SalesAmount],                           [T3_1].[LastName] AS [LastName]                    FROM   [DWTest].[dbo].[DimCustomer] AS T3_1                           INNER JOIN                           [tempdb].[dbo].[TEMP_ID_15] AS T3_2                           ON ([T3_2].[CustomerKey] = [T3_1].[CustomerKey])) AS T2_2                   ON ([T2_1].[DateKey] = [T2_2].[OrderDateKey])          GROUP BY [T2_1].[FullDateAlternateKey], [T2_2].[LastName]) AS T1_1</source_statement>
      <destination_table>[TEMP_ID_16]</destination_table>
      <shuffle_columns>FullDateAlternateKey;</shuffle_columns>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllDistributions" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_15]</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_14]</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="RETURN">
      <location distribution="AllDistributions" />
      <select>SELECT   [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey],           [T1_1].[LastName] AS [LastName],           [T1_1].[col] AS [col]  FROM     (SELECT [T2_1].[FullDateAlternateKey] AS [FullDateAlternateKey],                   [T2_1].[LastName] AS [LastName],                   [T2_1].[col] AS [col]            FROM   (SELECT   SUM([T3_1].[col]) AS [col],                             [T3_1].[FullDateAlternateKey] AS [FullDateAlternateKey],                             [T3_1].[LastName] AS [LastName]                    FROM     [tempdb].[dbo].[TEMP_ID_16] AS T3_1                    GROUP BY [T3_1].[FullDateAlternateKey], [T3_1].[LastName]) AS T2_1            WHERE  ([T2_1].[col] &gt; CAST ((5000.0) AS DECIMAL (5, 1)))) AS T1_1  ORDER BY [T1_1].[FullDateAlternateKey] DESC</select>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllDistributions" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_16]</sql_operation>
      </sql_operations>
    </dsql_operation>
  </dsql_operations>
</dsql_query>

So now we just save this as a .sqlplan file and open it in SSMS, right?

Nope!

See, that’s not a regular execution plan, at all. Instead, it’s a D-SQL plan. It’s not the same as our old execution plans. You can’t open it as a graphical plan (and no, not even in that very popular 3rd party tool, I tried). You will have to learn how to read these plans differently because, well, they are different.

Aug 15 2016

Query Store, Force Plan and Dropped Objects

I love the Query Store. Seriously. It’s a huge leap forward in the capabilities of Azure SQL Database and SQL Server in support of performance monitoring and query optimization. One of my favorite aspects of the Query Store is the ability to force plans. Frankly though, it’s also the scariest part of the Query Store. I do believe that plan forcing will be one of the most ill-used functions in SQL Server since the multi-statement table-valued user-defined function (don’t get me started). However, unlike the UDF, this ill-use will be because of poor understanding on the part of the user, not a fundamental design issue. No, plan forcing and the Query Store are very well constructed. Let me give you an example of just how well constructed they are.

Let’s imagine that have a situation such as bad parameter sniffing where you’ve determined that from the more than one possible execution plans against a table, there is a preferred plan. Enabling plan forcing to ensure that plan gets used is a no-brainer. Let’s further imagine that you have a junior DBA who is… let’s just say overly aggressive in their duties such that they do silly things occasionally. What happens when your pretty plan, which uses a particular index meets your junior DBA who just dropped that index?

Here’s the setup. We’re using the WideWorldImporters database and we have this query:

SELECT  *
FROM    Warehouse.StockItemTransactions AS sit
WHERE   sit.TransactionOccurredWhen BETWEEN '9/9/2015'
                                    AND     '9/11/2015';

This query, with the default configuration, will scan the existing table, so I’ll add an index:

CREATE INDEX TransactionOccurredWhenNCI
ON Warehouse.StockItemTransactions
(TransactionOccurredWhen);

For a limited range such as the one I’m passing above, I’ll get a plan with a key lookup operation which runs faster than the scan, so I’m happy. For a broader range, I’m likely to see a scan again, but since most of my queries have a very narrow range, I’d sure like to be able to force the plan to always compile to the seek and key lookup. To do this I need to find the query_id and plan_id from the Query Store (assuming I’m not using the GUI):

SELECT  qsp.plan_id,
        qsp.query_id,
		qsqt.query_sql_text,
		qsp.count_compiles
FROM    sys.query_store_plan AS qsp
JOIN    sys.query_store_query AS qsq
        ON qsq.query_id = qsp.query_id
JOIN    sys.query_store_query_text AS qsqt
        ON qsqt.query_text_id = qsq.query_text_id
WHERE   qsqt.query_sql_text LIKE 'SELECT  *
FROM    Warehouse.StockItemTransactions AS sit%';

With those values, I can force the execution plan so that it will always use the plan I want:

EXEC sys.sp_query_store_force_plan 42460,463;

That’s it. I’m happy because I’m going to see the execution plan used over and over, despite any values passed during a recompile.

Then…

Along comes our aggressive junior DBA who decides that there are “too many” indexes on the server. No, I don’t know what that means either, but they evidently read it on the internet or something so they drop the index we created before:

DROP INDEX TransactionOccurredWhenNCI ON Warehouse.StockItemTransactions;

What now happens to our lovely execution plan and the plan forcing? We’ll take a look at two events in Extended Events, sql_statement_recompile and query_store_plan_forcing_failed. Nothing happens immediately on dropping the index. The plans associated with that object, if any, are marked as invalid in the cache. The next time we call the query it’s going to recompile and we can see the event:

recompile_event

The most important part of the event is the recompile_cause which is “Schema changed”. However, I would also note the attach_activity_id.guid. I’ve chosen to enable causality tracking in this Extended Event session. This will cause all events associated with a common activity to get a GUID and then a sequence. This is interested because, after the recompile event, we get the query_store_plan_forcing_failed event:

plan_forcing_failed

The guid value is the same as the event above and the *.seq number is now 2, showing that, for these events, the recompile event occurred and then this event occurred. That makes perfect sense. The plan is marked for recompile, so, it’s going to be recompiled. I have enabled plan forcing though, so I have a particular plan that I want the optimizer to use. However, thanks to my “helpful” junior DBA, the plan is now invalid. You even get the description of what happened in the message field for the event:

Index ‘WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhenNCI’, specified in the USE PLAN hint, does not exist. Specify an existing index, or create an index with the specified name.

The first question now is, what happens with this query and the execution plan? Does the new plan generated now that the index is missing get stored in cache? Yes, it does. We can validate that by querying the cache, or, when capturing the actual execution plan, checking the “Retrieved from cache” property.

Because plan forcing is enabled, do we see a recompile every time this query is called? The answer to that question is slightly complex. Under normal circumstances, no. As long as that plan remains in cache, it’s simply reused. No other recompiles occur. A normal recompile event will cause another attempt at applying the invalid execution plan and we would see yet another query_store_plan_forcing_failed event for each recompile on the query. However, during testing, Joey D’Antoni (who was helping me play with this when we discussed what would happen when a plan was made invalid) had severe memory pressure on his server. He saw intermittent recompiles with a cause message that said plan forcing had failed. So if your server is under extreme stress and you cause this issue, you might see different messages. Just remember, the cause of the recompiles was not the plan forcing, but the memory pressure.

The fun thing is, as long as I don’t remove the plan forcing or take the query and plan out of the Query Store manually, if I recreate the index on my table with the same name and definition as that expected by the plan, the Query Store will simply reapply the plan and then successfully force it during any subsequent recompile situation. This is because Query Store is persisted with the database and barring outside activity, the information there will remain, just like the rest of the data in the database.

All of this means that Query Store works exactly the way we would expect, not forcing additional recompiles when you, or your junior DBA, inadvertently invalidate a plan. It also works as expected in that forcing a plan is stored with your database so that, assuming you don’t remove that plan from the Query Store, it will simply be reapplied after you fix the problem. It’s fun to see the thought that went behind the design of the behavior of Query Store. However, please, use plan forcing judiciously.

Jul 18 2016

Common Table Expression, Just a Name

The Common Table Expression (CTE) is a great tool in T-SQL. The CTE provides a mechanism to define a query that can be easily reused over and over within another query. The CTE also provides a mechanism for recursion which, though a little dangerous and overused, is extremely handy for certain types of queries. However, the CTE has a very unfortunate name. Over and over I’ve had to walk people back from the “Table” in Common Table Expression. The CTE is just a query. It’s not a table. It’s not providing a temporary storage space like a table variable or a temporary table. It’s just a query. Think of it more like a temporary view, which is also just a query.

Every time I explain this, there are people who don’t believe me. They point to the “Table” in the name, “See. Says so right there. It’s a table.”

It’s not and I can prove it. Let’s create a relatively simple CTE and use it in a query:

WITH    MyCTE
          AS (SELECT    c.CustomerName,
                        cc.CustomerCategoryName
              FROM      Sales.Customers AS c
              JOIN      Sales.CustomerCategories AS cc
              ON        cc.CustomerCategoryID = c.CustomerCategoryID
              WHERE     c.CustomerCategoryID = 4)
    SELECT  *
    FROM    MyCTE;

Now, I’m going to run the query within the CTE and the CTE together as two statements in a batch and capture the execution plans:

ExecPlans

On the top, the CTE, on the bottom, the query. You’ll note that the execution plans are identical. They each have the exact same Query Plan Hash value in the properties, 0x88EFD2B7C165E667, even though they have different Query Hash values, 0x192FFC125A08CC35 and 0xFEB7F2BCAC853CD5, respectively. Further, if I capture the query metrics using extended events, I get identical reads and, on average, identical execution times:

duration

This is because, there is no table being created. The data is not treated differently. A CTE is just a query, not some type of temporary storage.

Heck, let’s do one more thing. Let’s use the latest SSMS plan comparison tool and highlight one of the operators to see what differences there are internally in the plan:

 

plancompare

I don’t see a lot of differences. In fact, I don’t see any. That’s because the optimizer recognizes these two queries as identical. If it was loading data into temporary storage, you would see differences in something. We don’t. This is because, despite the somewhat unfortunate emphasis that gets placed on the Table portion of the name, the emphasis of the name, Common Table Expression, should be on the word Expression.

I will point out an interesting difference, especially useful for those who plug in CTEs everywhere, whether it’s needed or not. Let’s look at the properties of the two plans:

peroperties

You can see the similarities and differences that I pointed out earlier in the Statement, Query Hash and Query Plan Hash, as well as the Estimated Subtree Cost and others. What’s truly interesting is that the CompileCPU, CompileMemory and CompileTime for the CTE is higher than the regular query. While the CTE is just a query, it’s a query that adds a non-zero overhead when used, and therefore, should only be used where appropriate (good gosh, I’ve seen people put it EVERWHERE, on every single query, don’t do that).

Hopefully, this is enough to establish, truly, completely, and thoroughly, that the Common Table Expression is an expression, not a table.

Yeah, I did this before, but it keeps coming up, so I tried a different approach. Let’s see if the word gets out. Your Common Table Expression is not a table.


I love talking about execution plans and query tuning. I’ll be doing this at an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

Don’t  miss your chance to an all day training course on execution plans before SQL Saturday Oslo in September.

Feb 24 2016

Query Store and Recompile

One of the many advantages of SQL Cruise is the ability to have enough time during a presentation to be able to answer questions from the people there in great detail. One question came up while I was showing the new functionality of Query Store (available soon in SQL Server 2016, available right now in Azure SQL Database).

What happens to plan forcing when you have OPTION RECOMPILE on a query?

Great question. I have a favorite procedure I use to illustrate the functionality of parameter sniffing:

ALTER PROC dbo.AddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;

If this procedure is called with the value of ‘Mentor’ you get an execution plan that looks like this:

2016-02-22_10-48-00

If you remove that plan from cache and then call the procedure with the value of ‘London’ then the plan looks like this:

2016-02-22_10-38-43

It’s parameter sniffing in action. One of the cool pieces of functionality that comes with the Query Store is the ability to choose a plan and make that plan get used, regardless of the parameter values. You do this through the following command:

sys.sp_query_store_force_plan

You pass it a query_id and a plan_id  from the information in the Query Store and now that plan will get used. But, what happens when you have attempted to fix your query using a query hint, such as RECOMPILE? Let’s modify our procedure:

ALTER PROC dbo.spAddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City
OPTION  (RECOMPILE);

If I run the procedure, regardless of the parameter values passed in, I get a new plan for each value. Nothing is stored in cache, but, it is stored in the Query Store.

Let’s pull the query_id and plan_id out of Query Store for this procedure:

SELECT  qsq.query_id,
        qsp.plan_id,
        CAST(qsp.query_plan AS XML) AS sqlplan
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsq.object_id = OBJECT_ID('dbo.spAddressByCity');

Currently, on my system, this brings back four rows, two distinct query_id and four different plan_id. This is because, when I modified the text of the query within the procedure, I kept the object_id the same, but, the query text is different (the RECOMPILE hint). This means a different record in the Query Store. We’ll use the newer query_id and pick one of the plans to force it:

EXEC sys.sp_query_store_force_plan
    42,
    44;

Now, no matter what value I pass it, the RECOMPILE hint is effectively ignored. I have the same plan every time (in my case, the London plan). Ah, but is the RECOMPILE hint ignored? In fact, no, it isn’t. If I capture the extended event sql_statement_recompile, I can see every single time I execute this procedure that it’s going through a full recompile… which is then tossed and the plan I chose is forced upon the optimizer. Every time.

This isn’t a revelation. It makes sense. However, the Query Store represents one more thing that we have to think through.


If you want to talk query tuning, execution plans, the Query Store, and more, I’ll be presenting an all day pre-conference seminar at SQLDay in Wroclaw Poland on May 16th, 2016. Let’s talk.

Aug 10 2015

Generating Estimated Plan and the Plan Cache

Does generating an Estimated Plan cause that plan to be loaded into the plan cache?

No.

 

What? Still here? You want more? Proof? Fine. Let’s first run this bit of code (but please, not on your production server):

DBCC FREEPROCCACHE();

That will remove all plans from cache. Now, let’s take this query and generate an Estimated Plan (CTL-L from your keyboard or by clicking on the “Display Estimated Execution Plan” button on the toolbar):

SELECT * FROM Production.ProductModel AS pm;

This will generate a trivial plan showing a scan against the Production.ProductModel table. Now, let’s run another query:

SELECT  deqs.plan_handle
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE   dest.text = 'SELECT * FROM Production.ProductModel AS pm;';

That’s just an easy way to see if a plan_handle exists. If a plan was stored in cache for this query, I should see a result. I don’t. Now, you might say that this is because it’s a trivial plan. So, let’s complicate the query a little, add some JOINs and a WHERE clause:

SELECT  *
FROM    Production.Product AS p
JOIN    Production.ProductModel AS pm
        ON pm.ProductModelID = p.ProductModelID
JOIN    Production.ProductInventory AS pi
        ON pi.ProductID = p.ProductID
JOIN    Production.Location AS l
        ON l.LocationID = pi.LocationID
WHERE   p.ProductID = 750;

Not the most complicated query around, but, it will go through full optimization because it is not a query that will generate a trivial plan. Let’s generate an Estimated Plan again. Now we’ll modify our query against the cache just a little:

SELECT  deqs.plan_handle
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE   dest.text = 'SELECT  *
FROM    Production.Product AS p
JOIN    Production.ProductModel AS pm
        ON pm.ProductModelID = p.ProductModelID
JOIN    Production.ProductInventory AS pi
        ON pi.ProductID = p.ProductID
JOIN    Production.Location AS l
        ON l.LocationID = pi.LocationID
WHERE   p.ProductID = 750;'

You won’t see any rows returned. Now, if I execute the query and then re-run the query against the cache, then I see a plan. How about stored procedures? I have a really simple one I use to teach parameter sniffing:

EXEC dbo.spAddressByCity
    @City = N'Mentor';

I’ll generate an Estimated Plan and then query the cache using a slightly different approach:

SELECT  deps.plan_handle
FROM    sys.dm_exec_procedure_stats AS deps
WHERE   deps.object_id = OBJECT_ID('dbo.spAddressByCity');

Again, assuming I’ve cleared the cache and I haven’t executed the procedure, this returns zero rows. That’s because generating an Estimated Plan does not then load that plan into the cache. The plan gets loaded into cache when it gets executed.


I love talking about execution plans and query tuning. If you do too, we two opportunities to get together. The first is in Las Vegas at the Connections conference. Click here to register. If you bring your execution plans to Vegas, I’ll try to make some time during the session to use your plans to demonstrate to the crowd. The second opportunity will be down in San Diego, the day before their SQL Saturday event.

Aug 03 2015

Differences Between Actual & Estimated Plans

I have, in the past, made way too much of the need for Actual Plans when doing performance troubleshooting. The primary reason for this is to get the Actual Plan in order to see the differences between the Actual and Estimated Row Counts as a means of understanding how the optimizer saw the data. But, is that the only thing that’s different between Actual & Estimated Plans? Well, pretty much, yeah.

I took two fairly average execution plans from SQL Server 2014 and ran them through Altova’s XML Spy, which does XML comparisons similar to how Redgate SQL Compare will compare two data structures for you. Here is every single difference I found. Everything was additional information in the Actual Plan.

In the information for the first operator, in my case, a SELECT operator, in the QueryPlan element, two properties:

DegreeOfParallelism
MemoryGrant

In the MemoryGrantInfo element, six properties:

RequiredMemory
DesiredMemory
RequestedMemory
GrantWaitTime
GrantedMemory
MemoryUsed

Then, in all the operators, in each RelOp element in the XML of the execution plan, one additional XML element, RuntimeInformation with these properties:

RuntimeCountersPerThread
ActualRows
ActualEndOfScans
ActualExecutions

This data is not exhaustive. I used a relatively benign query (it hit about 10 tables, had some suggested missing indexes, and about 40 operators all told). I’m sure if I tried lots of different types of queries, I might find a few additional differences. But for your average, “Hey, Let’s Tune This”, query, we’re looking at one additional element in the operators and twelve (12) additional properties, only four of which are in the majority of the operators.

Don’t get me wrong. If I have the choice, I want to see the Actual Plan, because that Actual/Estimated row count and execution count comparison are important. However, they are not the be all, end all, of reading execution plans and query tuning. This means, if you can get the Estimated Plan, it’s probably good enough most of the time. Heck, you can look at the estimates and then look at the data and statistics directly to get a sense if they’re accurate or not. The only time you’re likely to see any differences (other than those noted above) between an Actual and Estimated Plan is in the event of a recompile. It’s also worth noting, the plan you’re going to retrieve from the cache is going to be an Estimated Plan.

Do not be afraid of the Estimated Plan.


 

If you want to spend a lot of quality time with me, talking execution plans and query tuning, I’ve got a couple of different all day pre-conference seminars coming up. The first is in Las Vegas at the Connections conference. Click here to register. The second will be down in San Diego, the day before their SQL Saturday event.

Dec 19 2014

“Pretty Plans vs. Performance” or “Grant Gets Pwned”

If you get an execution plan that looks like this:

prettyplan_uglyplan

I wouldn’t blame you for immediately thinking about query tuning. Especially if the code that generated it looks like this:

SELECT  soh.OrderDate,
        sod.OrderQty,
        sod.LineTotal
FROM    Sales.SalesOrderHeader AS soh
INNER JOIN Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
WHERE   soh.SalesOrderID IN (@p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10,
                             @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18,
                             @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26,
                             @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34,
                             @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42,
                             @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50,
                             @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58,
                             @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66,
                             @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74,
                             @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82,
                             @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90,
                             @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98,
                             @p99);

Let’s replace this with a table variable, maybe even one passed in as a parameter. The plan then looks like this:

prettyplan_pretty1

Ah, much prettier. I’m happy now, all’s right with the world… But, just in case, let’s look at performance. The first query ran in about 2.2ms and had 599 reads. The second query ran in about 24ms and had 598 reads… crud.

Well, let’s modify everything again. Instead of a table variable, we’ll use a temporary table and get some statistics into this puppy which will clean things right up. Here’s the new plan:

prettyplan_pretty2

Looks pretty familiar, although there are slight differences in the cost estimates between this plan and the preceding one. But the run time is 85ms with 714 reads AND I had to create the temporary table which added time to the whole thing.

Doggone it, that other plan is heinous and ugly and so is the query which uses an IN clause. Shouldn’t the cleaner, simpler, execution plan be an indicator that we’re going to get better performance?

Nope.

The thing is, just because the execution plan is simple and easy to understand does not mean it’s going to perform well. You can’t simply look to an execution plan to understand performance. You have to measure the run times on the query, look to the resources it’s going to use in order to understand where waits are likely, look at it’s reads, and take all this into account, along with understanding what the execution plan is doing in order to make the appropriate choices for performance on your system.

UPDATE:

I kept working with this because I was convinced I could get faster performance. The main difference as I saw it was that the optimizer sorted the data in the IN clause and I wasn’t explicitly sorting the data in any of my replacement queries. But nothing I did resulted in better execution times. And that was upsetting.

First, when you’re tuning a query, you’re going to look at the execution plans, as I did above. But, when you want to measure the performance of queries, it’s a very good idea to turn off execution plans and just capture the query metrics. I knew this and was doing it and you could see the results in the Extended Events where I was capturing each statement for the SPID I was working within. I also had the SET STATISTICS IO and SET STATISTICS TIME enabled for the query. Since each execution would cause those to fire as part of the statements and those were making my  ExEvents window messy, I decided to turn them off… WHOA! Query execution times radically changed.

In fact, my first attempt at tuning the query, substituting a table parameter, was suddenly faster than the original. The fastest was when I pre-sorted the data in a temporary table (discounting the costs of sorting and inserting the data into the temp table just for the moment). In fact, the prettiest plan was indeed the fastest.

Experimenting further, it was the STATISTICS IO that completely changed the execution times.

In short, pay no attention to my original post above, instead, let the lesson be that I need to be very cautious about the Observer Effect.

Jun 25 2014

The Utility of Execution Plans in Natively Compiled Procedures

I’m actually having problems identifying the utility of execution plans when working with natively compiled procedures. Or, put another way, why bother? I’ve posted a couple of times on natively compiled procedures and SQL Server execution plans. I’ve found the differences interesting and enlightening, but I’m seriously questioning why I should bother, at least currently. I’m sure there will be many changes to the behaviors of the natively compiled procedures and their relationship with execution plans. But right now, well, let’s look at an example. I have three simple tables stored in-memory. Here’s the definition of one:

CREATE TABLE dbo.Address
    (
     AddressID INT IDENTITY(1, 1)
                   NOT NULL
                   PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 50000),
     AddressLine1 NVARCHAR(60) NOT NULL,
     AddressLine2 NVARCHAR(60) NULL,
     City NVARCHAR(30) COLLATE Latin1_General_100_BIN2 NOT NULL,
     StateProvinceID INT NOT NULL,
     PostalCode NVARCHAR(15) NOT NULL,
     ModifiedDate DATETIME
        NOT NULL
        CONSTRAINT DF_Address_ModifiedDate DEFAULT (GETDATE())
    )
    WITH (
         MEMORY_OPTIMIZED=
         ON);

I can then create the following code as a natively compiled procedure:

CREATE PROC [dbo].[AddressDetails] @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

When I call for an estimated plan (remember, no actual plans) I’ll get this:

Scan

If you click on it, you’ll note that there’s an index scan. But the costs are all zero. Everything is FREE! Or not. The execution time is 93ms. If I put an index on the City column, the execution plan changes to the one I showed previously, an index seek, and the execution time goes to 42ms. Clearly, the scans are costing something. Scans aren’t necessarily bad and seeks aren’t necessarily good, but it’s hard to spot issues with execution plans with no costing involved at all. Which makes me wonder, should we bothering with execution plans for the natively compiled procs? I’m honestly unsure.

For most query tuning, statistics matter a lot. I understand we still have room in Albany on July 25th. You can register here. I’m doing an all day session at SQL Connections in September in Las Vegas. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. I’d love to talk query tuning with you all day long.

 

 

Jun 17 2014

Natively Compiled Procedures and Bad Execution Plans

I’ve been exploring how natively compiled procedures are portrayed within execution plans. There have been two previous posts on the topic, the first discussing the differences in the first operator, the second discussing the differences everywhere else. Now, I’m really interested in generating bad execution plans. But, the interesting thing, I wasn’t able to, or, rather, I couldn’t see evidence of plans changing based on silly things I did to my queries and data. To start with, here’s a query:

CREATE PROC [dbo].[AddressDetails] @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

And this is a nearly identical query, but with some stupid stuff put in:

CREATE PROC [dbo].[BadAddressDetails] @City VARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

I’ve change the primary filter parameter value to a VARCHAR when the data is NVARCHAR. This difference is likely to lead to differences in an execution plan, although not necessarily. If I load my tables up and update my statistics, then create the procedures and run them both with the same parameter values, I should detect any differences, right? Here’s the resulting execution plan:

ActualPlan

It’s an identical plan for both queries. In fact, the only difference in the plan that I can find is a CAST in the Index Seek operator for the BadAddressDetails procedure, as expected. But, it didn’t prevent the plan… the plan, from showing any other difference. However, execution is something else entirely. And this is where things get a little strange. There are two ways to execute a procedure:

EXEC dbo.AddressDetails @City = 'London';
EXEC dbo.AddressDetails 'London';

Interestingly enough, the first one is considered to be the slow way of passing a parameter. The second one is the preferred mechanism for natively compiled procedures. Now, if I execute these two versions of calling the procedure, I actually see different performance. The first call, the slow one, will run, somewhere in the neighborhood of 342 µs. The other ran in about 255 µs. Granted, we’re only talking about ~100 µs, but we’re also talking a 25% speed increase, and that’s HUGE! But that’s not the weird bit. The weird bit was that when I ran the good and bad queries together, the slow call on the bad query was consistently faster than the slow call on the good query. The fast call reversed that trend. And, speaking of which, the bad query, with the CAST ran in about 356 µs or ~25% slower.

The execution plan really didn’t show any indication that this would be slower, which made me do the next thing I did. I updated my Address table so that all the values were equal to ‘London.’ Then, because statistics are not maintained on in-memory tables automatically, I updated the statistics:

UPDATE STATISTICS dbo.Address WITH FULLSCAN, NORECOMPUTE;

With the statistics up to date, I dropped and recreated the procedure (there is no recompile with natively compiled procedures, something to keep in mind… maybe, more in a second). So now, the selectivity on the index was 1. The most likely outcome, an index scan. Guess what happened? Nothing. The execution plan was the same. I then went nuts, I converted all my tables so that a horrific mishmash of data would be brought back instead of clean data sets and I put data conversions in and… nothing. Index Seeks and Nested Loops joins. Weirdness.

I’m actually unsure why this is happening. I’m going to do more experimenting with it to try to figure out what’s up. But, that lack of recompile, maybe it doesn’t matter if, regardless of data distribution, you’re going to get the same plan anyway. I’m really not positive that looking at the execution plan for natively compiled procedures does much of anything right now. However, these tests were a little bit subtle. I’ll load up more data, get a more complex query and then really mess around with the code to see what happens. I’ll post more of my experiments soon.

I promise not to experiment on you though when I’m teaching my all day query tuning seminars. There are a bunch coming up, so if you’re interested in learning more, here’s where to go.  Just a couple of days left before Louisville and I’m not sure if there’s room or not, but it’s happening on the 20th of June. Go here to register. Albany will be on July 25th, but we’re almost full there as well. You can register here. SQL Connections is a pretty cool event that takes place in September in Las Vegas. In addition to regular sessions I’ll be presenting an all-day session on query tuning on the Friday of the event. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. Let’s get together and talk.