Jun 25 2014

The Utility of Execution Plans in Natively Compiled Procedures

I’m actually having problems identifying the utility of execution plans when working with natively compiled procedures. Or, put another way, why bother? I’ve posted a couple of times on natively compiled procedures and SQL Server execution plans. I’ve found the differences interesting and enlightening, but I’m seriously questioning why I should bother, at least currently. I’m sure there will be many changes to the behaviors of the natively compiled procedures and their relationship with execution plans. But right now, well, let’s look at an example. I have three simple tables stored in-memory. Here’s the definition of one:

CREATE TABLE dbo.Address
    (
     AddressID INT IDENTITY(1, 1)
                   NOT NULL
                   PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 50000),
     AddressLine1 NVARCHAR(60) NOT NULL,
     AddressLine2 NVARCHAR(60) NULL,
     City NVARCHAR(30) COLLATE Latin1_General_100_BIN2 NOT NULL,
     StateProvinceID INT NOT NULL,
     PostalCode NVARCHAR(15) NOT NULL,
     ModifiedDate DATETIME
        NOT NULL
        CONSTRAINT DF_Address_ModifiedDate DEFAULT (GETDATE())
    )
    WITH (
         MEMORY_OPTIMIZED=
         ON);

I can then create the following code as a natively compiled procedure:

CREATE PROC [dbo].[AddressDetails] @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

When I call for an estimated plan (remember, no actual plans) I’ll get this:

Scan

If you click on it, you’ll note that there’s an index scan. But the costs are all zero. Everything is FREE! Or not. The execution time is 93ms. If I put an index on the City column, the execution plan changes to the one I showed previously, an index seek, and the execution time goes to 42ms. Clearly, the scans are costing something. Scans aren’t necessarily bad and seeks aren’t necessarily good, but it’s hard to spot issues with execution plans with no costing involved at all. Which makes me wonder, should we bothering with execution plans for the natively compiled procs? I’m honestly unsure.

For most query tuning, statistics matter a lot. I understand we still have room in Albany on July 25th. You can register here. I’m doing an all day session at SQL Connections in September in Las Vegas. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. I’d love to talk query tuning with you all day long.

 

 

Jun 17 2014

Natively Compiled Procedures and Bad Execution Plans

I’ve been exploring how natively compiled procedures are portrayed within execution plans. There have been two previous posts on the topic, the first discussing the differences in the first operator, the second discussing the differences everywhere else. Now, I’m really interested in generating bad execution plans. But, the interesting thing, I wasn’t able to, or, rather, I couldn’t see evidence of plans changing based on silly things I did to my queries and data. To start with, here’s a query:

CREATE PROC [dbo].[AddressDetails] @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

And this is a nearly identical query, but with some stupid stuff put in:

CREATE PROC [dbo].[BadAddressDetails] @City VARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

I’ve change the primary filter parameter value to a VARCHAR when the data is NVARCHAR. This difference is likely to lead to differences in an execution plan, although not necessarily. If I load my tables up and update my statistics, then create the procedures and run them both with the same parameter values, I should detect any differences, right? Here’s the resulting execution plan:

ActualPlan

It’s an identical plan for both queries. In fact, the only difference in the plan that I can find is a CAST in the Index Seek operator for the BadAddressDetails procedure, as expected. But, it didn’t prevent the plan… the plan, from showing any other difference. However, execution is something else entirely. And this is where things get a little strange. There are two ways to execute a procedure:

EXEC dbo.AddressDetails @City = 'London';
EXEC dbo.AddressDetails 'London';

Interestingly enough, the first one is considered to be the slow way of passing a parameter. The second one is the preferred mechanism for natively compiled procedures. Now, if I execute these two versions of calling the procedure, I actually see different performance. The first call, the slow one, will run, somewhere in the neighborhood of 342 µs. The other ran in about 255 µs. Granted, we’re only talking about ~100 µs, but we’re also talking a 25% speed increase, and that’s HUGE! But that’s not the weird bit. The weird bit was that when I ran the good and bad queries together, the slow call on the bad query was consistently faster than the slow call on the good query. The fast call reversed that trend. And, speaking of which, the bad query, with the CAST ran in about 356 µs or ~25% slower.

The execution plan really didn’t show any indication that this would be slower, which made me do the next thing I did. I updated my Address table so that all the values were equal to ‘London.’ Then, because statistics are not maintained on in-memory tables automatically, I updated the statistics:

UPDATE STATISTICS dbo.Address WITH FULLSCAN, NORECOMPUTE;

With the statistics up to date, I dropped and recreated the procedure (there is no recompile with natively compiled procedures, something to keep in mind… maybe, more in a second). So now, the selectivity on the index was 1. The most likely outcome, an index scan. Guess what happened? Nothing. The execution plan was the same. I then went nuts, I converted all my tables so that a horrific mishmash of data would be brought back instead of clean data sets and I put data conversions in and… nothing. Index Seeks and Nested Loops joins. Weirdness.

I’m actually unsure why this is happening. I’m going to do more experimenting with it to try to figure out what’s up. But, that lack of recompile, maybe it doesn’t matter if, regardless of data distribution, you’re going to get the same plan anyway. I’m really not positive that looking at the execution plan for natively compiled procedures does much of anything right now. However, these tests were a little bit subtle. I’ll load up more data, get a more complex query and then really mess around with the code to see what happens. I’ll post more of my experiments soon.

I promise not to experiment on you though when I’m teaching my all day query tuning seminars. There are a bunch coming up, so if you’re interested in learning more, here’s where to go.  Just a couple of days left before Louisville and I’m not sure if there’s room or not, but it’s happening on the 20th of June. Go here to register. Albany will be on July 25th, but we’re almost full there as well. You can register here. SQL Connections is a pretty cool event that takes place in September in Las Vegas. In addition to regular sessions I’ll be presenting an all-day session on query tuning on the Friday of the event. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. Let’s get together and talk.

 

Jun 10 2014

Differences In Native Compiled Procedures Execution Plans

All the wonderful functionality that in-memory tables and natively compiled procedures provide in SQL Server 2014 is pretty cool. But, changes to core of the engine results in changes in things that we may have developed a level of comfort with. In my post last week I pointed out that you can’t see an actual execution plan for natively compiled procedures. There are more changes than just the type of execution plan available. There are also changes to the information available within the plans themselves. For example, I have a couple of stored procedures, one running in AdventureWorks2012 and one in an in-memory enabled database with a few copies of AdventureWorks tables:

--natively compiled
CREATE PROC dbo.AddressDetails @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

--standard
CREATE PROC dbo.AddressDetails @City NVARCHAR(30)
AS
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    Person.Address  AS a
                JOIN Person.StateProvince  AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN Person.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
GO

The execution plans are obviously a little bit different, one going against in-memory tables and indexes and the other going against standard ones. However, that’s not the point here. This is the point. One of the first things I always check when looking at a new execution plan is the first operator, the SELECT/INSERT/UPDATE/DELETE operator. Here it is from the estimated plan of the query against the standard tables:

StandardSelectProperties

All the juicy goodness of the details is on display including the Optimization Level and Reason for Early Termination, row estimates, etc. It’s a great overview of how the plan was put together by the optimizer, some of the choices made, useful information such as the parameters used, etc. It’s great. Here’s the same thing for the natively compiled procedure:

NativeSelectProperties

Uhm… where are all my wonderful details? I mean, honestly, everything is gone. All of it. Further, what’s left, I’m pretty sure, is nothing but a lie. Zero cost? No, but obviously not from the standard optimizer estimates either, so, effectively zero. I’m pretty sure Physical Operation is just there as an oversight. In short, this is a different game. Yes, you will still need to evaluate execution plans for natively compiled procedures, but we’re talking a whole different approach now. I mean, great googly moogly, there’s not parameter compile time values. Is that just ignored now? Are the days of bad parameter sniffing behind us, or, are the days of good parameter sniffing gone forever? And it’s not just the SELECT operator. Here are the properties for a Nested Loops operator. First the standard set:

StandardNestedLoops

And, the natively compiled procedure:

NativeNestedLoops

Now, except for the fact that everything is FREE, the differences here are easier to explain. Execution Mode is applicable to columnstore indexes, and none of those are available yet in in-memory storage, so I’m not shocked to see that property removed. Same for the others. But this complete lack of costing is going to make using execution plans, always a problematic proposition with only estimated values available for so many things, even harder. It might even make it so that all you really need to do is look at the graphical plan. Drilling down on the properties, until meaningful data starts to appear there, might be a waste of time for natively compiled procedures.

I’ll keep working on these. Next up, can you get a “bad” execution plan with a natively compiled procedure? We’ll find out.

Just a reminder that I’m taking this show on the road. I’m doing a number of all day seminars on query tuning at various events in multiple countries. Louisville has almost filled the room we have available on the 20th of June. Go here to register.  But don’t wait. I’m also going to be in Albany on July 25th, but we’re almost full there as well. You can register here. If you were thinking about attending SQL Connections in September in Las Vegas, In addition to regular sessions I’ll be doing a day on query tuning. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event.

 

Jun 05 2014

Natively Compiled Procedures and Execution Plans

The combination of in-memory tables and natively compiled procedures in SQL Server 2014 makes for some seriously screaming fast performance. Add in all the cool functionality around optimistic locking, hash indexes and all the rest, and we’re talking about a fundamental shift in behavior. But… Ah, you knew that was coming. But, you can still write bad T-SQL or your statistics can get out of date or you can choose the wrong index, or any of the other standard problems that come up that can negatively impact all those lovely performance enhancements. Then what? Well, same as before, take a look at the execution plan to understand how the optimizer has resolved your queries. But… Yeah, another one. But, things are a little different with the natively compiled procedures and execution plans. I have a very simple little example in-memory database with just a few tables put up into memory and a straight forward procedure that I’ve natively compiled:

CREATE PROC dbo.AddressDetails @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

The fun thing is, even with these really small examples, the performance differences are staggering when compared to standard tables or just in-memory tables alone. Anyway, this is what the estimated plan for this procedure looks like:

ActualPlan

Looks like a pretty standard execution plan right? Here’s the actual plan:

 

 

No, don’t bother refreshing your browser, that’s just a blank couple of lines because, there is no actual plan. You’re not dealing with a standard query, remember. There are only a couple of reasons to get an actual plan. First, if you’re experiencing recompiles, you might want to see the plan that was ultimately executed. An actual plan will reflect this, as will a plan pulled from cache. Second, you want to see some of the run-time metrics, actual rows, actual executions, runtime parameter values. Well, the first is not an issue since you’re not going to see these things recompile. It’s a DLL. The second could be an issue. I’d like to see actual versus estimated to understand how the optimizer made it’s choices. Regardless, the actual plan won’t generate in SSMS when you execute the natively compiled procedure.

There are some more differences between the plans for natively compiled procedures and standard procedures. I’ll go over a few more in another blog post.

Hey, if you do want to talk query tuning? I’m taking my one day seminar on the road to a bunch of different events. There’s still time to get to the event in Louisville on the 20th of June. That’s right before the SQL Saturday there. Go here to register. I’m also going to be putting this on the day before SQL Saturday Albany. You can register here. I’m very honored to have been selected to speak at SQL Connections in September in Las Vegas. This includes an all day seminar on query tuning. Go here to register for this great event. I’m also very excited to be able to say that I’m also going to be doing a different seminar in Belgium for SQL Server Days. I’ll be presenting an all day seminar on execution plans, including lots of details on SQL Server 2014. Go here to register for this event.

That’s four opportunities to get together and spend an entire day talking about query tuning, execution plans, statistics, the optimizer, extended events, oh, all sorts of things.

Mar 18 2014

Finding Mistakes

Ever had that moment where you start getting errors from code that you’ve tested a million times? I had that one recently. I had this little bit of code for pulling information directly from query plans in cache:

WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
QueryPlans AS
(
SELECT RelOp.pln.value(N'@PhysicalOp', N'varchar(50)') AS OperatorName,
RelOp.pln.value(N'@NodeId',N'integer') AS NodeId,
RelOp.pln.value(N'@EstimateCPU', N'decimal(10,9)') AS CPUCost,
RelOp.pln.value(N'@EstimateIO', N'decimal(10,9)') AS IOCost,
dest.text
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY deqp.query_plan.nodes(N'//RelOp') RelOp (pln)
)

SELECT  qp.OperatorName,
        qp.NodeId,
        qp.CPUCost,
        qp.IOCost,
        qp.CPUCost + qp.IOCost AS EstimatedCost
FROM    QueryPlans AS qp
WHERE   qp.text = 'some query or other in cache'
ORDER BY EstimatedCost DESC;

I’ve probably run this… I don’t know how many times. But… I’m suddenly getting an error:

Msg 8114, Level 16, State 5, Line 7
Error converting data type nvarchar to numeric.

What the hell? There is no where this should be occurring. I dig through the query over and over and I can’t figure it out. Until… I finally notice that one of the operators in the plan has the CPUCost value stored as a float:

EstimateCPU=”1e-006″

Ummmm, since when? Since forever. I’ve just been lucky with my code. I’d just never hit a sufficiently small cost in the plans before. I hadn’t bothered to look for the actual data type in use in the schema definition, although it’s right there:

<xsd:attribute name=”EstimateCPU” type=”xsd:double” use=”required”/>

 

Long-Kiss-Goodnight-samuel-l-jackson-14034435-720-480

I never did one thing right in my life, you know that? Not one. That takes skill.

What did I do wrong? I was looking at the data output from the queries and in the plans rather than looking at the structure to know what to expect. It’s the kind of thing I would never do with T-SQL. I would always look to the table structure to know what data type a given column was. But in this case, with the XML, I looked at the data and made an assumption. And we all knows what that means. It makes an ass out of you and mption.

Or, in this case, me and mption.

Anyway, the corrected query is a pretty trivial change:

WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
QueryPlans AS
(
SELECT RelOp.pln.value(N'@PhysicalOp', N'varchar(50)') AS OperatorName,
RelOp.pln.value(N'@NodeId',N'integer') AS NodeId,
RelOp.pln.value(N'@EstimateCPU', N'float') AS CPUCost,
RelOp.pln.value(N'@EstimateIO', N'float') AS IOCost,
dest.text
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY deqp.query_plan.nodes(N'//RelOp') RelOp (pln)
)

SELECT  qp.OperatorName,
        qp.NodeId,
        qp.CPUCost,
        qp.IOCost,
        qp.CPUCost + qp.IOCost AS EstimatedCost
FROM    QueryPlans AS qp
WHERE   qp.text = 'SELECT * FROM HumanResources.vEmployee AS ve'
ORDER BY EstimatedCost DESC;

But I do feel bad if anyone has been using this the way I showed it. ‘Cause, yeah, I’ve demonstrated with this code in the past. Oops. However, great point. Especially when working with a public XML schema like this, it pays to go and look at that schema the same way you would a table in order to ensure that you’re using the correct data types.

Feb 12 2014

SQL Server 2014 and the New Cardinality Estimator

Cardinality, basically the number of rows being processed by an operation with the optimizer, is a calculation predicated on the statistics available for the columns in question. The statistics used are generally either the values from the histogram or the density. Prior to SQL Server 2014, and going all the way back to SQL Server 7.0 (in the Dark Ages when we had to walk uphill to our cubicles through 15 feet of snow battling Oracle DBAs and Fenris the whole way), there’s been one cardinality estimator (although you can modify the behavior somewhat with a traceflag in 2008R2 and 2012). Not any more. There’s a possibility for really complex, edge-case queries, that you may run into a regression from this.

You control whether or not you get the new cardinality estimator by setting the Compatibility Level of the database to SQL Server 2014 (120 for the picky amongst us). This could lead to regression issues. So, you’re going to pretty quickly want to know if your execution plan is using the new Cardinality Estimation Model, right? It’s tricky. Just look at the properties of the first operator in the plan (I told you to use that first operator). You’ll find one value there that will tell you what you need to know:

CardinalityEstimator

Just check this value (which you can also get from the XML behind the graphical plan) to see what calculations the optimizer used to arrive at the plan you’re observing.

Sep 16 2013

Why the Lazy Spool Is Bad

First thing, there are no bad operators, just bad parents, uh, I mean query writers, or database designers, or ORM tools. Why do I say this? Because all the operators within a query execution plan serve a purpose. They are there to fulfill a task. Depending on where and when you see them, they’re doing exactly what you ask of them. The issues come up because you’re asking them to do a task that they may not be well suited for. This comes from inappropriate structures and inappropriate code. Lazy spools are not really bad (that was just link bait). In fact, depending on the query, what’s being done, how you’re retrieving data, what data is being retrieved, the lazy spool is actually awesome. But, it’s good to know what these things are doing because, like magic, the lazy spool comes with a price.

The spool operators within SQL Server are indications of the need to store data for reuse. That’s it. A spool is just gathering up information, putting it into a temp table so that it can use that data again. Key word and trick phrase in that last sentence, temp table. Again, depending on the needs of the query, temporary storage is a great solution. BUT, you are creating more stuff into tempdb, additional overhead to your disks, all that’s associated with this. So if your system is already under load in this area, depending on the query in question, you may need to adjust in order to eliminate the spool. The use of tempdb is the price you pay for the lazy spool.

A lazy spool is a non-blocking operator. That means that it doesn’t stop the flow of data while it gets loaded. Instead, it acts more or less as a pass through, letting data flow through it as needed by the downstream operations. That makes a lot sense when you consider that the lazy spool is loaded in a “lazy” fashion, or, as requested. The eager spool for example stops all operations until it completes loading it’s temporary object, loading in an “eager” fashion. Once data is loaded into the lazy spool, it can be used by other operators within the plan.

Let’s see this in action. I’m going to call a recursive CTE query from AdventureWorks2012

EXEC dbo.uspGetEmployeeManagers @BusinessEntityID = 9

This query results in the following execution plan (click on it to expand it).

Lazy

Reading plans that involve spools, is where you really see the need to understand both the logical and physical processing order within an plan. The reason I say that is that if you just read this from right to left, the physical order, you’re likely to think that the spool is being loaded all the way over there on the right. Instead, it’s being loaded by Node ID 5, which I show below.

LazyDetail

This shows how the logical processing order, left to right, like reading a book, has to be taken into account to deal with query plans. Now, in this case, the spool is how the optimizer is dealing with the recursive nature of the query. It loads data into a spool operator in a lazy fashion and that data is there when needed later within the operations of the query. In short, the lazy spool in this case is doing a good thing and helping you out.

So, why the lazy spool is bad? Well, it isn’t. It’s just not free.

For more on spools, you can read an older post of mine here, excellent information from Fabiano Amorim here, and an example of forcing a spool to improve performance by Paul White here.

Want to talk more query tuning? Let’s get together in Dallas on November 1st, 2013 where I’ll be doing an all day pre-conference seminar before SQL Saturday 255.

Sep 11 2013

Execution Plan Cost Estimates

It’s been emphasized over and over that the costs of operations within an execution plan, and the estimated costs of the plan themselves are, in fact, estimates. But it goes further than that. The estimated values are based on statistics, or the lack thereof. Statistics themselves are also estimates. This means that the costs you’re seeing are extrapolations based on extrapolations. So, you should just ignore those values and move on, right? Wrong.

In order to understand how the optimizer is choosing to put together an execution plan for your query so that you can use that understanding to then make intelligent choices as to modifying the query or the structure of your database, you must use the values you have at hand. However, you must also understand where and how those values were derived in order to make a determination on how much faith you can put into them (because you are simply placing faith in those numbers). Above all else, this means you must understand your data as presented by the statistics and what these statistics represent in terms of defining your data.

For a more detailed discussion of what exactly statistics represent, see my article in Simple-Talk.

Next, you need to understand what the operator itself is doing. In the majority of cases, this means just reading the description. They’re usually pretty clear. But sometimes, it might not be that clear. Further, while the descriptions of an operator may be clear, you need to further understand why or what it is doing, not simply what it is. This means further drill down to reading through the properties of the operator (NOTE: not the tool tip) in an attempt to understand what’s going on. Boogle or Ging will be a friend here. You can search up descriptions of what operators are to assist with your understanding. You need to know when something doesn’t even have statistics, such as a table variable, and therefore the optimizer assumes it only has 1 row, because that’s going to radically affect the cost estimates displayed for you, even in an actual execution plan. That’s right, the actual plan costs are still just estimates.

I wish I could tell you to rely on these numbers, but you can’t. I wish I could tell you to ignore these numbers, but you can’t. These are the only numbers you get that show you what’s happening internally within the query within the optimizer, so you must use them. Just use them with the full knowledge that they are calculations based on other calculations based on extrapolations. In short, a bit of a guess.

Want to talk query tuning, execution plans, optimizer and statistics some more? Come see me at the all day pre-conference seminar in Dallas before the 2013 SQL Saturday there in Dallas this November.

Mar 25 2013

Execution Plans in Azure SQL Database

Microsoft has stated pretty clearly that they’re putting code on Azure first, ahead of the desktop. Which makes one wonder when we’re going to start to see some of this cool stuff within SSMS. What cool stuff you ask? Well, let me explain. Let’s start with a query:

SELECT	m.MovieName,
		msd.MovieStageName,
		st.StageTypeDesc
FROM	dbo.Movie AS m
		JOIN dbo.MovieStage AS ms
		ON m.MovieId = ms.MovieID
		JOIN dbo.MovieStageDefinition AS msd
		ON ms.MovieStageDefinitionId = msd.MovieStageDefinitionId
		JOIN dbo.StageType AS st
		ON msd.StageTypeId = st.StageTypeId
WHERE	m.MovieId = 42;

When I run this on Windows Azure SQL Database (WASD) I get the following execution plan:

Azure Execution Plan

Kind of weird, kind of useful, right? First thing new that I can do is zoom in using that slider bar you seen in the lower left and then graphical part of the plan looks like this:

Azure Execution Plan Zoomed In

The icons have shifted into the corner and you get the costs of every operator. You also get text describing what the operator does. You can scroll around to see the other operators. Pretty cool stuff. Yeah, I know you can zoom in & out on SSMS, but it doesn’t modify the icons in any way, making viewing it zoomed out all but useless. WASD gets better. Notice the tool bar on the left. I have it here in full size so you can see everything:

AzureToolBar

Now things get fun. First, at the top, there is a little arrow pointing to the left. You can hide this tool bar by clicking anywhere within it that’s not on one of the icons. The arrow is indicating the possibility of hiding the toolbar. Once hidden, the arrow changes to pointing to the right. Clicking again expands the toolbar. Immediately below that arrow, you see the really fun stuff: Sort by:. You’ll notice that Total is currently selected. This shows me the total estimated cost for each operator within the plan. But, I can modify the display of the plan so that I only see estimated CPU cost or estimated IO cost. The following graphic shows the same plan with CPU selected. See how the estimated cost percentages have all shifted to show the different highlights?

AzureExecPlanSorted

This is a great way to rearrange the view of the plan, bringing out different information. The only beef I have with it is that it doesn’t really sort the operators so much as change the display. I understand what they mean, but I think it’s misleading. Anyway, useful, but I’ve got more. Moving down the tool bar we find the great part: Find by:. Currently on the plans above you can see that None is selected. But what if I select a different operator such as Seek:

AzureExecPlanFind

Now, each instance of the operator type I selected is highlighted in blue, but, the sort totals for estimated costs are still displaying the numbers in red, so I haven’t lost any of the display that I’ve set up so far. This is wonderful stuff. As you can see, you only get a few operator types, Warnings, Scan, Seek, Merge Join, Hash Match, Nested Loops and Sort. But it’s a great way to begin to explore the plan in ways that we can’t currently within SSMS.

AzureIconsLet’s also note the little icons in the upper right of the plan, displayed to the left in full size. Clicking on these completely changes what’s represented to us from the execution plan. First the little grid looking icon results in this execution plan:

AzureExecPlanGrid

You can click on the graphic to make it bigger so you can see everything. What we have are the operator nodes, listed by Node ID order, showing some of the interesting information from the properties in a grid format. And yeah, because it’s a grid, I can click on any of the column headers and get different sort order for the data. For large plans, this can be a quick and easy way to find the highest estimated cost, or the largest number of actual rows. The grid isn’t a text plan. The grid is just the same XML data that makes up the execution plan laid out slightly differently. Great stuff indeed. What if we click on the little bullet list looking icon?

AzureExecPlanBullets

Now we have a sort of nested display, almost like a human-readable XML. And the little blue icons next to the operator names do allow you to expand and collapse the layout. I haven’t decided how excited I am by this layout because I’m just not sure how I would use it. You can’t resort on the columns even though this looks like a grid. It’s just a way to simplify the display with the collapsible nesting. Still, another tool in the toolbox.

Microsoft is doing some really cool stuff out there on WASD, making the portal there functional and useful.Let’s hope that some of this translates down to the core product.

If you’re interested in learning more about this kind of thing, you should take a look at the pre-conference seminar at TechEd 2013 that I’ll be putting on with Thomas LaRock (b|t) and Dandy Weyn (b|t) in New Orleans and Madrid, Spain. It’s called How to be a Successful DBA in the Changing World of Cloud and On-Premise Data. We’ll spend all day getting you deep into the guts of Azure SQL Database showing how to administer it, tune it, grow it, and make it work. You don’t want to miss it. Click here for New Orleans or here for Madrid to register for this and TechEd 2013.

I promise you, with Tom, Dandy and I all talking at once, this will be informative and entertaining.

Jan 30 2013

Azure SQL Database Execution Plan Differences

I’ve been exploring execution plans in Azure SQL Databases a lot lately. I’m getting a presentation together for  some upcoming SQL Saturday events (first one is SQL Saturday #177, Silicon Valley). If you scroll to the bottom of this previous post, I mentioned that there were clearly differences in the optimizer because queries against empty databases were generating different plans. I’ve loaded up the data in my database, both SQL Server and SQL Database, so I can compare real behaviors. Doing so, I found a fun difference, even though I was running the query and generating the plan from SQL Server Management Studio.

Here’s the property sheet from the SELECT operator for the query run against SQL Server:

AzureSelectProperties

And here’s the property sheet from the SELECT operator for the query run against SQL Database:

AzureSSMSSelectProperties

Two differences jump out. The smaller of the two is that for the same data set, there are differences in estimated values. I noted that before, but it’s fun to see it coming up. By the way, if a good estimate could be defined as “closest without going over” then the optimizer in SQL Database is closer with 1.9 to the SQL Server value of 2.1. Just sayin’. Anyway, the big thing that you may have noticed was the NonParallelPlanReason with a value of CouldNotGenerateValidParallelPlan. Oooh, cool! Not that I would have expected a parallel plan for an estimated cost of .0149347. Assuming SQL Database has a default value of 5 for parallel plans, that’s pretty far off (and I hope they don’t still use that ancient value, nor should you). Some enterprising individual could try to suss out how much cost is needed to get a parallel plan (I don’t have time for that right now). More to come on execution plans and query tuning in Azure SQL Databases.