May 31 2016

Precedence Goes to Query Store or Plan Guide?

While presenting at SQLDay in Wroclaw, Poland, on the Query Store, I was asked a pretty simple question, which takes precedence, the Query Store or a Plan Guide?

One of my favorite answers to questions is “I don’t know” because it gives me the opportunity to learn. Let’s figure this one out together.

I’ll post the code to recreate this experiment within AdventureWorks at the end of the article. I’m doing this because the code for forcing execution plans using Plan Guides can be pretty doggone long (you may need to generate your own XML from a plan on your own system, fair warning).

I have a stored procedure that I use frequently to demonstrate parameter sniffing and bad parameter sniffing, AddressByCity (listed below). This query when passed the value ‘London’ will generate a plan with a Merge Join. When passed the value ‘Mentor’ it will generate a plan with a Loops Join. These different plans are caused by the statistics for the different values. 434 rows for ‘London’ and 1 row for ‘Mentor’. To start the test, I’ll generate a plan for each value, dropping the plan from cache between executions, so that I have both plans in the Query Store. Now, we’re ready for phase 1.

Phase 1

I’m going to remove the current plan from cache. I will then use sp_query_store_force_plan to select the plan with the Loops join. I’ll run the query using ‘London’ to validate that the plan generated is NOT the plan for ‘London’ but rather the plan for ‘Mentor’. Done, and I can see the Use Plan results in the Properties of the Select operator:

QSvPG_UsePlan

I’ve highlighted a couple of interesting points. First, even though we are using a plan that is compiled for the value of ‘Mentor’, because this plan was compiled using the value ‘London’, that’s what gets stored, despite the plan used. You can then see down at the bottom, that the ‘Use plan’ property is set to true. Now, let’s force the plan using a plan guide (in my script below). To be very sure what happens, I’ll again clear the plan from cache.

If the Query Store wins, I should see a plan with a Loops Join. If the Plan Guide wins, I should see a plan with a Merge Join. The moment of truth:

QSvPG_Plan

Query Store wins! Let’s see how it shows in the Properties:

QSvPG_PlanGuideName

You’ll see that the Parameter Compile & Runtime values are the same and that the ‘Use plan’ property is the same. Note however, the PlanGuideName property is in play. In fact, the indication is that the optimizer thinks that a Plan Guide has been successfully applied. In fact, if I watch the Extended Events for the plan_guide_successful event, the best way to check how Plan Guides are being used, apart from looking at the execution plans, it shows success for the Plan Guide we created.

Phase 2

Precedence could matter. First, I remove the Plan Guide using sp_control_plan_guide. Then, I remove the Query Store force using sp_query_store_unforce_plan. I’ll also clear the plan out of cache again, just to be sure. Next, I create the Plan Guide. I’ll run the procedure using the value ‘Mentor’ to ensure that I see the Merge/’London’ plan, the PlanGuideName property in that plan, and a successful application of a Plan Guide from the plan_guide_successful ExEvent. All good.

Now, I once again force the plan from Plan Store. Rerun the plan…

The winner is Plan Store.

Phase 3

OK, so, if we try to force plans using Plan Guide, Query Store wins. What about if I’m just trying to apply a hint? Let’s clear everything up again and then create an OPTIMIZE FOR query hint in a Plan Guide. Simple right?

Query Store wins again… Except things get a little odd. Take a look at the Properties now:

QSvPG_Parameter

Normally, with a Plan Guide that includes an OPTIMIZE FOR hint, I’d see the Compiled Value as the defined value within the hint (in this case ‘London’). However, this time I’m seeing just the value of ‘Mentor’.

Conclusion

I don’t know what’s occurring in the internals. I just know the evidence I have in front of me. First, Query Store vs. Plan Guides? Query Store wins. However, internally, something is still going on. The Plan Guide is still successfully applied, at least according to what I can see. The processes that determine if a Plan Guide should be applied or not are not interfered with by the Query Store plan forcing. The fact that I’m seeing a Query Store plan and evidence of a successful Plan Guide though, that’s a problem that could cause issues when troubleshooting. Now you know, and knowing, as they say, is half the battle.


Let’s get together and talk about execution plans. In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m extremely exited to be in Oslo Norway for a pre-con before SQL Saturday Oslo in September.


/*NOTE: These scripts are not in the exact order in which
they must be run to re-create all my tests. These are
however, all the scripts you'll need.
*/
--the procedure
CREATE PROC dbo.AddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;
GO

--generating the first plan
EXEC dbo.AddressByCity
    @City = N'London';



--remove this from cache
DECLARE @PlanHandle VARBINARY(64);

SELECT  @PlanHandle = deps.plan_handle
FROM    sys.dm_exec_procedure_stats AS deps
WHERE   deps.object_id = OBJECT_ID('dbo.AddressByCity');

IF @PlanHandle IS NOT NULL
    BEGIN
        DBCC FREEPROCCACHE(@PlanHandle);
    END
GO



--generate the second plan
EXEC dbo.AddressByCity
    @City = N'Mentor';




--get the query & plan id
SELECT  qsq.query_id,
        qsp.plan_id
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsq.object_id = OBJECT_ID('dbo.AddressByCity');


--force the plan
EXEC sys.sp_query_store_force_plan 414,460;

--unforce the plan
EXEC sys.sp_query_store_unforce_plan 414,460;




--Get the plan_handle and offset 
SELECT  deqs.plan_handle,
        deqs.statement_start_offset
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE   dest.objectid = OBJECT_ID('dbo.AddressByCity');

--create a plan guide
EXEC sys.sp_create_plan_guide_from_handle
    @name = N'PlanGuidePrecedence',
    @plan_handle = 0x05000500C26A7610A03A4F53A600000001000000000000000000000000000000000000000000000000000000,
    @statement_start_offset = 108;

--remove plan guide
EXEC sys.sp_control_plan_guide
    @operation = N'DROP',
    @name = N'PlanGuidePrecedence';


--pull a plan out of cache. You'll need to make sure the right plan is there first
DECLARE @xmlplan NVARCHAR(MAX)

SELECT  @xmlplan = deqp.query_plan
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_text_query_plan(deqs.plan_handle,DEFAULT,DEFAULT) AS deqp
WHERE   deqp.objectid = OBJECT_ID('dbo.AddressByCity');

SELECT @xmlplan


--copied and pasted the value from @xmlplan for easy reuse
DECLARE @xmlplan NVARCHAR(MAX) =
'<ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.5" Build="13.0.1300.275"><BatchSequence><Batch><Statements><StmtSimple StatementText="CREATE PROC dbo.AddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City" StatementId="1" StatementCompId="3" StatementType="SELECT" StatementSqlHandle="0x0900D56A5A43E8B56B541DFF8DC13F149C6A0000000000000000000000000000000000000000000000000000" DatabaseContextSettingsId="1" ParentObjectId="276196034" StatementParameterizationType="0" RetrievedFromCache="true" StatementSubTreeCost="0.316799" StatementEstRows="434" SecurityPolicyApplied="false" StatementOptmLevel="FULL" QueryHash="0xDD75E124763781F2" QueryPlanHash="0x6C408B64C5E8A83F" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="120"><StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false"></StatementSetOptions><QueryPlan CachedPlanSize="32" CompileTime="3" CompileCPU="3" CompileMemory="360"><MissingIndexes><MissingIndexGroup Impact="89.7983"><MissingIndex Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]"><ColumnGroup Usage="EQUALITY"><Column Name="[City]" ColumnId="4"></Column></ColumnGroup></MissingIndex></MissingIndexGroup></MissingIndexes><MemoryGrantInfo SerialRequiredMemory="512" SerialDesiredMemory="720"></MemoryGrantInfo><OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="260900" EstimatedPagesCached="65225" EstimatedAvailableDegreeOfParallelism="2"></OptimizerHardwareDependentProperties><RelOp NodeId="0" PhysicalOp="Merge Join" LogicalOp="Inner Join" EstimateRows="434" EstimateIO="0" EstimateCPU="0.0069349" AvgRowSize="225" EstimatedTotalSubtreeCost="0.316799" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"><OutputList><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine1"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine2"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="PostalCode"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="Name"></ColumnReference></OutputList><Merge ManyToMany="0"><InnerSideJoinColumns><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference></InnerSideJoinColumns><OuterSideJoinColumns><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="StateProvinceID"></ColumnReference></OuterSideJoinColumns><Residual><ScalarOperator ScalarString="[AdventureWorks2014].[Person].[StateProvince].[StateProvinceID] as [sp].[StateProvinceID]=[AdventureWorks2014].[Person].[Address].[StateProvinceID] as [a].[StateProvinceID]"><Compare CompareOp="EQ"><ScalarOperator><Identifier><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="StateProvinceID"></ColumnReference></Identifier></ScalarOperator><ScalarOperator><Identifier><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference></Identifier></ScalarOperator></Compare></ScalarOperator></Residual><RelOp NodeId="1" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan" EstimateRows="181" EstimateIO="0.00386574" EstimateCPU="0.0003561" AvgRowSize="65" EstimatedTotalSubtreeCost="0.00422184" TableCardinality="181" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"><OutputList><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="StateProvinceID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="Name"></ColumnReference></OutputList><IndexScan Ordered="1" ScanDirection="FORWARD" ForcedIndex="0" ForceSeek="0" ForceScan="0" NoExpandHint="0" Storage="RowStore"><DefinedValues><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="StateProvinceID"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Alias="[sp]" Column="Name"></ColumnReference></DefinedValue></DefinedValues><Object Database="[AdventureWorks2014]" Schema="[Person]" Table="[StateProvince]" Index="[PK_StateProvince_StateProvinceID]" Alias="[sp]" IndexKind="Clustered" Storage="RowStore"></Object></IndexScan></RelOp><RelOp NodeId="2" PhysicalOp="Sort" LogicalOp="Sort" EstimateRows="434" EstimateIO="0.0112613" EstimateCPU="0.0060321" AvgRowSize="177" EstimatedTotalSubtreeCost="0.30564" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"><OutputList><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine1"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine2"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="PostalCode"></ColumnReference></OutputList><MemoryFractions Input="1" Output="1"></MemoryFractions><Sort Distinct="0"><OrderBy><OrderByColumn Ascending="1"><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference></OrderByColumn></OrderBy><RelOp NodeId="3" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan" EstimateRows="434" EstimateIO="0.257199" EstimateCPU="0.0217324" AvgRowSize="177" EstimatedTotalSubtreeCost="0.278931" TableCardinality="19614" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row"><OutputList><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine1"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine2"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="PostalCode"></ColumnReference></OutputList><IndexScan Ordered="0" ForcedIndex="0" ForceScan="0" NoExpandHint="0" Storage="RowStore"><DefinedValues><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressID"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine1"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="AddressLine2"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="StateProvinceID"></ColumnReference></DefinedValue><DefinedValue><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="PostalCode"></ColumnReference></DefinedValue></DefinedValues><Object Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Index="[PK_Address_AddressID]" Alias="[a]" IndexKind="Clustered" Storage="RowStore"></Object><Predicate><ScalarOperator ScalarString="[AdventureWorks2014].[Person].[Address].[City] as [a].[City]=[@City]"><Compare CompareOp="EQ"><ScalarOperator><Identifier><ColumnReference Database="[AdventureWorks2014]" Schema="[Person]" Table="[Address]" Alias="[a]" Column="City"></ColumnReference></Identifier></ScalarOperator><ScalarOperator><Identifier><ColumnReference Column="@City"></ColumnReference></Identifier></ScalarOperator></Compare></ScalarOperator></Predicate></IndexScan></RelOp></Sort></RelOp></Merge></RelOp><ParameterList><ColumnReference Column="@City" ParameterCompiledValue="N'London'"></ColumnReference></ParameterList></QueryPlan></StmtSimple></Statements></Batch></BatchSequence></ShowPlanXML>'


EXEC sys.sp_create_plan_guide
    @name = N'PlanGuidePrecedence',
    @stmt = N'SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;',
    @type = N'OBJECT',
    @module_or_batch = N'dbo.AddressByCity',
    @params = NULL,
    @hints = @xmlplan;


--create a plan guide with just a hint in it
EXEC sys.sp_create_plan_guide @name = 'PlanGuidePrecedence', -- sysname
    @stmt = N'SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;', -- nvarchar(max)
    @type = N'Object', -- nvarchar(60)
    @module_or_batch = N'dbo.AddressByCity', -- nvarchar(max)
    @params = NULL, -- nvarchar(max)
    @hints = N'OPTION(OPTIMIZE FOR(@City = ''London''))'; -- nvarchar(max)
May 16 2016

Query Store, Forced Plans, and New Plans

I love questions. I recently received one about new plans in the Query Store (available in Azure SQL Database now and in SQL Server 2016 after June 1).

Let’s say you have selected a plan that you want to force. You set it up. Now, let’s say the plan ages out of cache or even goes through a recompile. During the recompile, due to out of date statistics or skew in the statistics, you would, under normal circumstances, get a new plan. However, with Query Store and plan forcing, the plan that’s going to be used is the plan that is being forced. But, does that other plan, the one not used, get stored in Query Store?

I have no idea. Let’s find out.

The Setup

To start with, a small stored procedure that I use all the time for bad parameter sniffing demos that reliably gets different plans with different values due to statistics skew:

CREATE PROC dbo.spAddressByCity @City NVARCHAR(30)
AS
SELECT  a.AddressID,
        a.AddressLine1,
        a.AddressLine2,
        a.City,
        sp.Name AS StateProvinceName,
        a.PostalCode
FROM    Person.Address AS a
JOIN    Person.StateProvince AS sp
        ON a.StateProvinceID = sp.StateProvinceID
WHERE   a.City = @City;

If this procedure is called for a value of ‘London’ it gets a plan with a Merge Join. For most other value it gets a plan with a Loops Join. Here’s an example of the ‘London’ plan:

2016-02-22_10-38-43

I’ve run both queries on my test system. I can check their existence in Query Store this way:

SELECT  CAST(qsp.query_plan AS XML),
		qsq.query_id,
		qsp.plan_id,
		qsp.is_forced_plan
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsq.object_id = OBJECT_ID('dbo.spAddressByCity');

That’s going to return two (2) rows:

newplanresults

Now, I’m going to remove the second plan, the ‘Mentor’ plan, and I’m going to force the ‘London’ plan with two statements:

EXEC sys.sp_query_store_remove_plan @plan_id =219;

EXEC sys.sp_query_store_force_plan 2,2;

Now, when I execute the query, no matter what values I pass, I’m going to arrive at the ‘London’ plan. However, let’s pull the plan from cache in order to force a recompile:

DECLARE @PlanHandle varbinary(64);

SELECT  @PlanHandle = deps.plan_handle
FROM    sys.dm_exec_procedure_stats AS deps
WHERE   deps.object_id = OBJECT_ID('dbo.spAddressByCity');

DBCC FREEPROCCACHE(@PlanHandle);

The Reveal

When I execute the query using the value of ‘Mentor’ do I get that plan stored in the Query Store? I know which plan will be in the cache. The ‘London’ plan because of I have chosen to force that plan. But, when I query the Query Store again to see what’s there for my stored procedure:

newplannegativeresults

The short answer is, nope. Any possible new plans that could have been stored in Query Store are not there. Yes, we have gone into the compile process, but, due to plan forcing, that process is getting a small bypass and the plan is simply applied from the Query Store. This, even though, you’re going to see a compile or recompile event if you capture them using Extended Events. For more fun on recompiles and the Query Store, see this earlier post of mine.

Keep those questions coming.


I love talking about query tuning. If you want to spend the day with me talking query tuning, you sure can. In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’ll be doing a pre-conference seminar in Oslo Norway before SQL Saturday Oslo in September.

May 12 2016

Use The Correct Data Type

Blog post #5 in support of Tim Ford’s (b|t) #iwanttohelp, #entrylevel. Read about it here.

Saying that you should use the correct data type seems like something that should be very straight forward. Unfortunately it’s very easy for things to get confusing. Let’s take a simple example from AdventureWorks. If I run this query:

SELECT  a.ModifiedDate
FROM    Person.Address AS a
WHERE   a.AddressID = 42;

The output looks like this:

2009-01-20 00:00:00.000

Normal right? You see the year, the month and the day followed by the time in hours, minutes, and seconds as a decimal. Ah, but there is an issue. This query is supposed to be for the reporting system, and the business only cares about the date that the values in the Person.Address table have been modified, so they don’t want to see the time. Also, the person in charge is kind of picky. They really don’t like seeing the date formatted that way. They prefer to see “January 20, 2009.”

Far too often then, the easy answer, just change the column to varchar. You can trim the time and output in exactly the format needed by the business. Problem solved and it was easy…

Well, until someone inputs “Janry 20, 2009” slightly mangling the spelling and suddenly your report looks all messed up. Or, they ask you to start filtering just the last two weeks, regardless of when the report was run and you now can’t easily do date math on the column. Even after you get over that problem with a little formatting using CAST (along with ISDATE to try to catch all those other typos that are in the system now)  you notice that the performance is really slow so you go to put an index on the column and now you have an index key that is 50 bytes wide instead of the 3 bytes that the DATE data type would have been, making the index less efficient (not to mention, sorting the data is going to put February ahead of January, more formatting).

The list goes on and on. You’re going to hit issue after issue and all you tried to do was a little formatting, make the data look pretty by using the VARCHAR data type in place of DATETIME or DATE. This kind of thing happens all the time, especially to people just getting started. The rules should be very simple, use the correct data type for the data you’re collecting. However, it’s easy to get distracted. It’s easy to get pushed. The business says “put it in this format” and the only way you can do that is by changing the data type so…

Don’t do it. Take the time to understand the implications of mangling the data types. It’s not just date and time that cause issues either. Lots and lots of stuff can be easily stuffed into a VARCHAR field that more properly should be an INT or a DECIMAL or some other data type. Be able to communicate with the business so that they understand that you’re not simply being difficult, you’re trying to protect the functionality and integrity of the data. Show how formatting can be done using client tools. Explain all the downstream issues that this decision will have. Then, use the correct data type for the information being stored.

May 03 2016

Implicit Conversion and Performance

Letting SQL Server change data types automatically can seriously impact performance in a negative way. Because a calculation has to be run on each column, you can’t get an index seek. Instead, you’re forced to use a scan. I can demonstrate this pretty simply. Here’s a script that sets up a test table with three columns and three indexes and tosses a couple of rows in:

CREATE TABLE dbo.ConvertTest (
     BigIntColumn BIGINT NOT NULL,
     IntColumn INT NOT NULL,
     DateColumn VARCHAR(30)
    );

CREATE INDEX BigIntIndex 
ON dbo.ConvertTest 
(BigIntColumn);
CREATE INDEX IntIndex 
ON dbo.ConvertTest 
(IntColumn);
CREATE INDEX DateIndex 
ON dbo.ConvertTest
(DateColumn);

WITH    Nums
          AS (SELECT TOP (1000000)
                        ROW_NUMBER() OVER (ORDER BY (SELECT 1
                                                    )) AS n
              FROM      master.sys.all_columns ac1
              CROSS JOIN master.sys.all_columns ac2
             )
    INSERT  INTO dbo.ConvertTest
            (BigIntColumn,
             IntColumn,
             DateColumn
            )
    SELECT  Nums.n,
            Nums.n,
            DATEADD(HOUR, Nums.n, '1/1/1900')
    FROM    Nums;

If I run a very simple query like this:

DECLARE @param DATETIME;

SET @param = '3/15/1963 8:00:00';

SELECT  ct.DateColumn
FROM    dbo.ConvertTest AS ct
WHERE   ct.DateColumn = @param;

I get an execution plan that looks like this:

convert_scan

The issue is, while I put dates into the DateColumn, the data type there is VARCHAR, not DATETIME. Passing a DATETIME value results in an implicit conversion that we can see in the Index Scan Predicate property:

ConverImplicitHurtsPerformance

This is a function against a column and that is what results in the scan. If I modify the code as follows:

DECLARE @param VARCHAR(30);

SET @param = '3/15/1963 8:00:00';

SELECT  ct.DateColumn
FROM    dbo.ConvertTest AS ct
WHERE   ct.DateColumn = @param;

I get an execution plan that looks like this:

convert_seek

No change in data type is needed, so the index can be used. It’s a very simple demonstration of how implicit conversions hurt performance. Now, what about the integer columns? What happens when I do this with the code:

DECLARE @param INT;

SET @param = 650323;

SELECT  ct.BigIntColumn
FROM    dbo.ConvertTest AS ct
WHERE   ct.BigIntColumn = @param;

Here’s the execution plan:

convert_intseek

And here is the predicate:

convert_predicate

In this case, we’re still getting an implicit conversion of the data type, but SQL Server is smart enough to realize that, hey, it’s still just an integer, so it can convert the @param instead of the column, resulting in a seek. If I change the test so that it runs a query against the INT column passing a BIGINT value, it doesn’t even do a conversion of any kind.

In short, not all implicit conversions are a problem. However, it’s still a best practice to use the right data type for columns. You should also use the same data type for your parameters and local variables as the column data type. Taking these steps avoids issues with implicit conversion.


For lots more information on query tuning, in August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m also going to be doing an all day training course before SQL Saturday Oslo in September.

Apr 12 2016

A View Is Not A Table

Blog post #4 in support of Tim Ford’s (b|t) #iwanttohelp, #entrylevel

In SQL Server, in the T-SQL you use to query it, a view looks just like a table (I’m using the AdventureWorks2014 database for all these examples):

SELECT  *
FROM    Production.vProductAndDescription AS vpad;

 

SELECT  vpad.Name,
        vpad.Description,
        vpmi.Instructions
FROM    Production.vProductAndDescription AS vpad
JOIN    Production.Product AS p
        ON p.ProductID = vpad.ProductID
JOIN    Production.vProductModelInstructions AS vpmi
        ON vpmi.ProductModelID = p.ProductModelID
WHERE   vpad.ProductID = 891
        AND vpad.CultureID = 'fr';

The above query actually combines two views and a table. This is what is commonly referred to as a “code smell”. A code smell is a coding practice that works, but that can lead to problems. In this case, we’re talking about performance problems. The performance problems when using views to join to tables and other views as if they were real tables comes about because a standard view is not a table. Its a query. For example, the second view introduced, vPorductModelInstructions looks like this:

ALTER VIEW [Production].[vProductModelInstructions] 
AS 
SELECT 
    [ProductModelID] 
    ,[Name] 
    ,[Instructions].value(N'declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelManuInstructions"; 
        (/root/text())[1]', 'nvarchar(max)') AS [Instructions] 
    ,[MfgInstructions].ref.value('@LocationID[1]', 'int') AS [LocationID] 
    ,[MfgInstructions].ref.value('@SetupHours[1]', 'decimal(9, 4)') AS [SetupHours] 
    ,[MfgInstructions].ref.value('@MachineHours[1]', 'decimal(9, 4)') AS [MachineHours] 
    ,[MfgInstructions].ref.value('@LaborHours[1]', 'decimal(9, 4)') AS [LaborHours] 
    ,[MfgInstructions].ref.value('@LotSize[1]', 'int') AS [LotSize] 
    ,[Steps].ref.value('string(.)[1]', 'nvarchar(1024)') AS [Step] 
    ,[rowguid] 
    ,[ModifiedDate]
FROM [Production].[ProductModel] 
CROSS APPLY [Instructions].nodes(N'declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelManuInstructions"; 
    /root/Location') MfgInstructions(ref)
CROSS APPLY [MfgInstructions].ref.nodes('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelManuInstructions"; 
    step') Steps(ref);

GO

That’s a query against the XML stored in the ProductModel table. The view was created to mask the complexity of the necessary XPath code, while providing a mechanism for retrieving the data from the XML. This is a common use of views. However, when we then treat the view as a table, and join it to other tables and views, we present a problem for the optimizer. Because a view is not a table, but is instead a query, the optimizer has to resolve this query in combination with any other views or tables to arrive at an execution plan for the whole combined mess. While the optimizer is very good at what it does, because of the complexity caused by the additional unnecessary processing to figure out which parts of the view is not needed to satisfy the query, it can make poor choices. That can result in poor performance.

If I were to rewrite the query, it would look something like this:

SELECT  p.Name,
        pd.Description,
        pm.Instructions.value(N'declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelManuInstructions"; 
        (/root/text())[1]', 'nvarchar(max)') AS Instructions
FROM    Production.Product AS p
JOIN    Production.ProductModelProductDescriptionCulture AS pmpdc
        ON pmpdc.ProductModelID = p.ProductModelID
JOIN    Production.ProductDescription AS pd
        ON pd.ProductDescriptionID = pmpdc.ProductDescriptionID
JOIN    Production.ProductModel AS pm
        ON pm.ProductModelID = p.ProductModelID
CROSS APPLY Instructions.nodes(N'declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelManuInstructions"; 
    /root/Location') MfgInstructions (ref)
CROSS APPLY MfgInstructions.ref.nodes('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelManuInstructions"; 
    step') Steps (ref)
WHERE   p.ProductID = 891
        AND pmpdc.CultureID = 'fr';

That’s a lot more complex than the query we had above that only referenced three objects and had only two JOIN operations. However, if you capture the I/O and the execution time for these queries, you’ll see a different story.

I used two methods for measuring performance. I used SET STATISTICS IO and SET STATISTICS TIME to ON for the queries for one set of tests. For another set I used Extended Events. Consistently the execution time for the query with the view was around 110ms. The query that didn’t reference any views was around 37ms. The reads were 155 for the query with views, but only 109 for the query without. If you look at the individual table I/O, you can start to see where the differences come from. These are the results from the query with the views:

Table ‘ProductDescription’. Scan count 0, logical reads 56
Table ‘ProductModelProductDescriptionCulture’. Scan count 28, logical reads 56
Table ‘xml_index_nodes_418100530_256001’. Scan count 13, logical reads 37
Table ‘ProductModel’. Scan count 0, logical reads 2
Table ‘Product’. Scan count 0, logical reads 4

These are the results for the query without the view:

Table ‘ProductDescription’. Scan count 0, logical reads 56
Table ‘xml_index_nodes_418100530_256001’. Scan count 13, logical reads 37
Table ‘ProductModelProductDescriptionCulture’. Scan count 6, logical reads 12
Table ‘ProductModel’. Scan count 0, logical reads 2
Table ‘Product’. Scan count 0, logical reads 2

You can see the differences in both ProductModelProductDescriptionCulture and Product. This is because of the differences in the execution plans caused by differences in the choices made by the optimizer.

A standard view is not a table. There is such a thing as a materialized view or indexed view, which is a table. That’s not what we’re talking about here. While you can use a view as if it was a table, don’t mistake it for a table. A view is just a mask in front of a query. It can’t be used like an object so that you avoid rewriting the same JOIN. That will lead to issues for the optimizer as this simple set of examples showed. Don’t shy away from using views, just understand what their real behavior is. A view is a query, not a table.

Apr 05 2016

Views and Simplification

I’ve been getting lots of questions on views lately. Must be something in the water.

Because SQL Server allows you to treat a view as if it was a table, lots of people pretty much assume that it is a table since they get to treat it that way. The thing is, a view is not a table. It’s a query. Let’s explore this just a little bit. Here’s a relatively straight forward view:

CREATE VIEW dbo.PersonInfo
AS
SELECT  a.AddressLine1,
        a.City,
        a.PostalCode,
        a.SpatialLocation,
        p.FirstName,
        p.LastName,
        be.BusinessEntityID,
        bea.AddressID,
        bea.AddressTypeID
FROM    Person.Address AS a
JOIN    Person.BusinessEntityAddress AS bea
        ON a.AddressID = bea.AddressID
JOIN    Person.BusinessEntity AS be
        ON bea.BusinessEntityID = be.BusinessEntityID
JOIN    Person.Person AS p
        ON be.BusinessEntityID = p.BusinessEntityID;
GO

I can query this view like this:

SELECT  *
FROM    dbo.PersonInfo AS pni
WHERE   pni.LastName LIKE 'Ran%';

The resulting execution plan looks like this:

viewSimple1

You don’t even need to expand it for what I’m about to show. If we modify the query against our view as follows:

viewSimple2

Again, you can expand these, but you don’t need to. Notice, the first plan had four tables being referenced, which represent the four tables from the view. The second query only has two tables. This is because the optimizer looked at the query that the view represents, not simply the query that I used to call the view. It then recognized that simplification could be used to eliminate unnecessary JOIN operations from the execution plan and still get the same data because of foreign key constraints on the tables.

The important point to note is that the optimizer is absolutely not treating the view like a table. The optimizer is treating the view like a query, which is all it is. This has both positive and negative impacts when it comes to query performance tuning and this view. You could spend all sorts of time “tuning” the view, only to find all that tuning you’ve done tossed out the window when the query doesn’t reference a column in the view and that causes the optimizer to rearrange the plan. I don’t want to convey that this is an issue. It’s not. I’m just trying to emphasize the point that a view is just a query.

Now, when we get into treating a view exactly like a table in JOINs or calling a view from a view (known as nesting), then we’re talking about issues. I’ll put up another post on a JOIN and views.


For lots more information on query tuning, I’m presenting an all day pre-con at SQL Day in Wroclaw Poland on May 16.

Mar 28 2016

Query Store and Optimize For Ad Hoc

I love presenting sessions because you get so many interesting questions. For example, what happens with Optimize for Ad Hoc when Query Store is enabled? Great question. I didn’t have the answer, so, on to testing.

For those who don’t know, Optimize for Ad Hoc is a mechanism for dealing with lots and lots of ad hoc queries. When this is enabled, instead of storing an execution plan the first time a query is called, a plan stub, basically the identifying mechanisms, for the plan is stored in cache. This reduces the amount of space wasted in your cache. The second time the query is called, the plan is then stored in cache.

I’m going to set up Optimize for Ad Hoc and Query Store and, to clean the slate, I’ll remove everything from cache and clear out the Query Store, just in case:

EXEC sys.sp_configure
    N'show advanced options',
    N'1';
RECONFIGURE WITH OVERRIDE;
GO
EXEC sys.sp_configure
    N'optimize for ad hoc workloads',
    N'1';
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sys.sp_configure
    N'show advanced options',
    N'0';
RECONFIGURE WITH OVERRIDE;
GO
DBCC FREEPROCCACHE();
GO
USE AdventureWorks2014;
GO
ALTER DATABASE AdventureWorks2014 SET QUERY_STORE = ON;
GO
ALTER DATABASE AdventureWorks2014 SET QUERY_STORE CLEAR;
GO

Then, we just need an ad hoc query:

SELECT  p.Name,
        soh.OrderDate,
        sod.OrderQty
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Road-750 Black, 48'
        AND sod.OrderQty > 10;

If I run this query one time in my cleaned up environment, I can check to see if Optimize For Ad Hoc is working by querying the cache:

SELECT dest.text,
deqs.execution_count,
deqp.query_plan
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
WHERE dest.text LIKE ‘SELECT p.Name,%’;

The results look like this:

adhoc

So, what’s in the Query Store. We’ll use this query:

SELECT  qsqt.query_sql_text,
        qsq.count_compiles,
        CAST(qsp.query_plan AS XML)
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_query_text AS qsqt
        ON qsqt.query_text_id = qsq.query_text_id
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
WHERE   qsqt.query_sql_text LIKE 'SELECT  p.Name,%';

The results look like this:

adhocquerystore

In short, the plan is stored in the query store, even though the plan isn’t stored in cache. Now, this has implications. I’m not saying they’re good and I’m not saying they’re bad, but there are implications. If you’re in a situation where you need to use Optimize For Ad Hoc to help manage your cache, now, you’re going to possibly see negative impacts on your Query Store since it’s going to capture all the plans that you avoided. There are mechanisms for managing Query Store behavior.

I’m going to modify my own Query Store to change the capture behavior from “All” to “Automatic.” This enables an internal filtering mechanism, defined by Microsoft, to eliminate some captures. When I reset everything and run the example ad hoc query one time, I get the plan stub in cache, but nothing in the query store (that I can see). I run the ad hoc query again and now I get a plan in the cache, but nothing in the query store. If I run the ad hoc query for a third time, there’s a counter somewhere (I haven’t found it yet) because I suddenly get a query in the Query Store.

For a bit more information, let’s modify the Query Store query to include some runtime stats:

SELECT  qsqt.query_sql_text,
        qsq.count_compiles,
        CAST(qsp.query_plan AS XML),
        qsrs.count_executions
FROM    sys.query_store_query AS qsq
JOIN    sys.query_store_query_text AS qsqt
        ON qsqt.query_text_id = qsq.query_text_id
JOIN    sys.query_store_plan AS qsp
        ON qsp.query_id = qsq.query_id
JOIN    sys.query_store_runtime_stats AS qsrs
        ON qsrs.plan_id = qsp.plan_id
WHERE   qsqt.query_sql_text LIKE 'SELECT  p.Name,%';

Now, I run this query and the one from above against the cache, I get the following information:

adhocboth

 

(you might have to click on that to make it bigger).

Here’s the interesting bit. The execution_count from cache, the top set of results, is 2, even though I ran the query three times. What happens in cache is that the plan stub is removed and the count is reset. The bottom set of results, from Query Store, shows only a single execution.

What does all this mean? Just that as we add additional behaviors to our systems, we have additional management worries. With the ability to modify the Query Store behavior, you won’t need to necessarily worry that you’re going to get hurt by your need to use Optimize for Ad Hoc.

Mar 07 2016

SQL Server Backups Are A Business Decision

Blog post #3 in support of Tim Ford’s (b|t) #iwanttohelp, #entrylevel

Read more about Tim’s challenge here.

It’s very easy to think of SQL Server backups as a technical problem. You have so much stuff going on, BACKUP DATABASE commands, recovery models, BACKUP LOG commands, Differential backups. Getting them all into the correct order and automating the processes sure seems like a technical problem. It isn’t. It’s all about the business. If you’re taking on the duties of a DBA whether you’re an accidental DBA, a reluctant DBA or you were voluntold into the DBA position, you need to plan to sit down with responsible parties from the business and get an understanding with them regarding RPO and RTO.

RPO is a TLA for Recovery Point Objective. The easiest way to describe RPO is to ask, “In terms of time, how much data are we willing to lose?” The immediate answer is always going to be zero. Here is where we have to be honest. You won’t be able to guarantee zero data loss (yeah, there are probably ways to do this, but #entrylevel). Talk with the business. Most of the time, you’ll find that they’d actually be OK with 15 minutes, or maybe 5 minutes, or even an hour of lost data. It really varies, not only from business to business, but from database to database within the business (allow for this flexibility). You need to establish this number. RPO is going to help you figure out how to set up your backups, your recovery model, your logs and their backups. All that stuff that seemed so technical, it’s all based on this extremely important number that you’re going to work with the business to arrive at.

Oh, but we’re not done. Once you’ve managed to get the business comfortable (as comfortable as they can be) with the idea that they could lose data, you also have to prepared them for the idea that, in the event of a disaster, restoring the database from backups is not going to be instant. It’s going to take time. This is where we have to define the RTO or Recovery Time Objective. This is our goal for how quickly we can restore the database. RTO is not so much a negotiation with the business as it is an education for the business. You see, you can only restore so fast on your hardware. Further, the RESTORE DATABASE process is dependent on the size of the backups. Even further, it’s dependent on the types of restore operations we’re running and whether or not we use WITH RECOVERY in the RESTORE operations. You may have to test a few restores to get an idea how fast things are with your system. Regardless, the RTO has to be arrived at and agreed on. You may also have to readdress the RTO as the number, size and volume of your data changes over time. Be prepared for this as well.

With the RPO established, you can now decide on the recovery model. Let’s take an example. If the business says that they can afford a day of data loss, depending on the size of your database, you can put this database into SIMPLE recovery, run a full backup once a day and walk away a winner. Another example, the business decides that it could live with up to 15 minutes worth of data loss. Now you have to go to FULL recovery and you have to set up log backups in addition to your full backups. Then, you start to mix the RTO into the mix. Let’s say you’re outage was to occur at 8PM and you run your backups at Midnight. You now have to restore 20 hours worth of log backups. That can take a long time. So, in order to make the RTO as short as possible, you toss in a differential backup every day at noon. Now you’ll only ever have to restore 12 hours worth of backups, so you can define a rough RTO for the business.

These simplified, and somewhat simplistic, examples are just the start of the process of figuring out how best to do your backups. However, that’s the technical part of the problem. The fundamental definitions that you have to have in order to start solving this technical issue are business decisions that you must get your business people involved with. Define the RPO and RTO, then start defining your recovery strategy.

Jan 12 2016

“Applies To…” in the MSDN Documentation

Quick little post. I just wanted to share how happy I am with the new “THIS TOPIC APPLIES TO” infographic. An example here:appliesto

I think it makes things much more clear when you’re attempting to figure out what’s up with some T-SQL syntax.

Well done Microsoft and thank you.

Side note, this only exists in documentation that has been updated recently. I first saw it in some documentation that was updated January 11, 2016. It’s not there in another piece of documentation I saw that was updated October 15, 2015. Here’s hoping it gets put everywhere. It works.

Oct 20 2015

Statistics for the New Data Pro

Next week at the PASS Summit I’ll be presenting a session called Statistics for the New Data Pro.

You can read the abstract at the link. I just want to emphasize that this is a beginner level session. I think way too many people who are just starting out with SQL Server don’t understand the role that statistics play in determining how your queries are going to behave. What’s more, too many people don’t know how to get and read statistics to understand how it is that the optimizer thinks you have X number of rows in your database that match a given value. I’m going to make darned sure that the people who attend this session come out with a full understanding of how to read the statistics. This includes the good information available in the header, how the density graph is used, and a complete understanding of the monstrosity that is the histogram (it’s not that bad).

We’re also going to talk about statistics maintenance. You need to know how stats get updated because they absolutely impact how your queries are going to behave. There are a couple of ways that statistics are automatically maintained and I really want to be sure that you know what those are. We’ll also talk about manual maintenance, which must be done.

Overall, this is meant as a session for those who have stumbled into a position at work that requires them to be a Data Professional. You can call it accidental DBA, or reluctant database programmer, or just the poor individual who was left standing when everyone else stepped backwards one pace. If you already feel you have a firm grasp of statistics, awesome. You might not want to attend this session. It’s Summit. There are others during the same time slot. But, I have a request. Could you please direct your junior DBA to this? If you know someone who’s just getting started writing T-SQL code, maintaining a server, developing applications and reports, I think this session would do them good, so please send them on over.

Also, on the topic of statistics, I’d like to suggest that you go to Erin Stellato’s session, Statistics and Query Plans. That’s going to add a bunch of information in addition to what I’m covering. Plus, she’s a great speaker. It takes place the day before my session, but that’s OK. We’re covering different information. They’re very complementary sessions.