Nov 21 2016

Query Data Store Data

The data in the Query Data Store is what makes all the magic happen. From collecting the aggregate performance metrics of a query to the various plans in use by that query to being able to force a plan, it’s all controlled by the data within the Query Data Store system tables.

The Question

When I was presenting on this topic at the PASS Summit a few weeks ago, one great question came up (great question = answer is “I don’t know”), well, I defaulted to an “I don’t know” answer, but my guess was, “No.” The question was: can you take a plan from one server, let’s say a test server, export it in some way, and then import it to production? In this manner, you ensure that a plan you like gets into production without having to clear the plan from cache & generate a plan by running the query.

Great idea.

However, I knew that the Query Data Store information is all stored in system tables. You can’t directly modify system tables in SQL Server (well, not any more. In old versions prior to 2005, you could. Ah the joy of REALLY screwing up a database). However, I tested it. You won’t be shocked to find out, nope. you can’t.

So What Can You Do?

You have a number of data manipulation options. Since you can query the data store, obviously, you can export that data and store it elsewhere. When you backup a database, the Query Data Store data gets backed up. You can also directly manipulate the data using the procedures listed here to flush the store, remove a particular query or plan, or reset the performance statistics.

That’s about it. In it’s current incarnation, there’s not much in the way of direct manipulation of the data in the query store. It’s just like any other system table in that way.

Just Imagine

I’d love to see them add the ability to upload, or import, an execution plan. Imagine the scenario where you have sharded your database across multiple servers, or, if you’re in Azure SQL Database, multiple databases. You identify a particular execution plan that’s going to work well for all these instances. Wouldn’t it be great to be able to import that plan and force all the various servers to use it, all programmatically? Heck yes.

However, we’re not there today.


I love talking performance tuning. In fact, I’ll be doing an all day seminar on execution plans and query tuning before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.

Nov 18 2016

Book Review: Marshmallow Test

Different people approach their career in different ways. My working assumption, all the time, is that I don’t know enough and I’m not good enough. Don’t get me wrong, I’ve got an ego (way too big of one really). I just know that there’s room for improvement. It’s one of the reasons I read books like the Marshmallow Test.

The core concept is simple. You show a kid a treat, a marshmallow. You let them know that they can have that marshmallow right now, or, if they wait, they have two later. Kids who choose to wait, and succeed in waiting, generally do better in life, have better jobs, lower BMI, all sorts of things. It’s about discipline and self-control, and even more importantly, delayed gratification. “Yeah, I can something good now, but I can have something terrific if I wait.”

The book goes through where this experiment came from, how the data was correlated, and all the other types of experiments that sprang from this as a mechanism for managing self-control. Self-help books can frequently be either overly dry or horrifyingly preachy. The Marshmallow Test is neither.  The writing is engaging and the information is presented in a digestible fashion. If anything, I’d love to see more numbers and number crunching on how they arrived at their conclusions. That’s available, but it’s in scientific papers referenced within the book.

The best section of the book was the end where they talk about how you can apply this learning and understanding in teaching children and in managing yourself. Personally, my kids are grown, so any more damage I can do there is done, so it’s just down to me. The focus on the will power and self-control is largely around avoidance, stopping drinking or eating less, and I’m finding it effective (when I remember to do it). If there was one thing I’d like to see more in the book is positive will power. Not simply how to put off eating that last bit of chocolate that’s calling me from the cabinet, but the self-control to get this review written, to start working on another blog post.

If you’re interested in learning about how self-control develops in people, this is a great read. If you need to get some bad habits under control, this book will help. If you’re looking for something to get you up, off the couch, it’s not quite there.

Nov 09 2016

Azure Data Platform Learning Resources

I know that more and more of you are moving more and more resources into the Azure Data Platform. In an effort to help you learn about all that Azure offers I created a GitHub repository of blogs, teachers, courses, etc. It’s open to the community to update. New material has been recently added and there will be more and more all the time. Check it out. If you know of resources that should be listed there, please issue a branch pull and add them right in. This is your resource.

Nov 08 2016

PASS Board Update: Post-Summit 2016

Monday

I got in on Sunday and chose to have a small dinner with a couple of friends, quiet, preparing.

Monday was a less hectic day than the others . The Board had the morning off, although Redgate had me go and give a session at an event. Monday afternoon was one of our three in-person board meetings. The minutes will be published soon. I was responsible for running the meeting. I also presented two topics, first, and most importantly, our current financial status. Then I presented the initial set of thoughts towards some SMART goals for Global Growth, which I will share once they are further developed .

Monday evening I had two events I had to attend. First, as part of the Executive Committee, I attended the kick off dinner for the PASS HQ team. Next, I went to the Friends of Redgate Dinner. Then, I went to bed early because the next day had a bunch of important meetings with the community.

Tuesday

Now things start to rock. We started the morning with a meeting with the SQLSaturday organizers. The meeting was run by the SQLSaturday portfolio owner, Argenis Fernandez. I was there as a representative of the Exec and as a member of the Global Growth committee. The gist of the meeting was to report on progress since last year, plans for the upcoming year, and as a chance for direct feedback from the community. It went well.

Next, I had a meeting with the Regional Mentors. This was co-run by myself and Ryan Adams. Ryan, as the Chapters portfolio owner, and I, as the Global Growth committee chair, have very similar thoughts and plans for how to expand and grow the duties, impact and opportunities for the Regional Mentors. These people are vital for the maintenance of the PASS community, and, I would argue, for the growth of it, especially globally. We largely discussed plans for the coming year.

I’m not done with Tuesday. I next went to the Chapters Leaders meeting, again as a representative of the Exec and the GG committee. Ryan’s meeting ran much like Argenis’ with a report on what has been done. Ryan is working hard to deliver on a number of initiatives,  including launching a Speakers Bureau. Ryan then outlined some plans and took feedback from the community.

In my mind, these were successful meetings because we got the word out on what we’ve been working on, and we got direct, positive and actionable feedback from the members of the community who took part.  Great stuff.

But Tuesday isn’t finished. I next attended the First Timers speed networking event. I co-hosted the event with Bill Graziano. It was a blast. Unfortunately, I had to leave early because I was booked to be in three places at the same time. I only made two.

I went to the Global Alliance Program partners’ reception with several members of the Board. The GAP partners are play an important role in supporting PASS’ growth. Thanks to all involved.

Next we had the traditional Tuesday night Welcome Reception. This is the easy part of the job. I was just there, walking around the room, talking to people. It was a nice night with lots of old friends and a few new ones.

Still not done because I went to the Volunteer Party. PASS is a not-for-profit organization and a heck of a lot of work is done by the volunteers, so we show our appreciation here. I had some great talks to follow up on the information raised at meetings during the day (yes, parties are work too).

Finally, I went to the Karaoke party hosted by Denny Cherry.

Wednesday

PASS Summit launches with the Wednesday keynote. I was there bright and early in case there was anything I could do to help out. It was all in hand, so I got to kick back a little and watch Joseph Sirosh deliver a bunch of great information and a positive vision for the Data Platform moving forward.

Following the keynote, the executive committee had a meeting with Microsoft. As one of our founding partners, and the reason we’re all here, we meet with Microsoft regularly. It’s great to have these opportunities to meet face-to-face with our key partners at Microsoft.  I truly appreciate their time and their feedback.

I was part of the keynote on Thursday, so I had to spend some time rehearsing on Wednesday. Not mentioned in the days above is the time I spent reviewing my slides and helping prepare the speech that goes with them. PASS HQ does most of the work after I set the vision, but I still had to ensure that I delivered the vision that I wanted.

I took some time after this to spend at the Redgate booth. I still have a job there to do.

In the late afternoon, I put on a regular session, The Query Store and Query Tuning in SQL Server. I had 550 attendees (no, not inside knowledge, I asked the room proctor on the way out the door) and I think it went well. I love the topic (just search the blog if you don’t believe me).

I went back down to the Redgate booth and worked until the Exhibitor Reception that night.

Done. I went to my hotel at 8PM this night in order to be ready for the next day.

Thursday

KILT DAY!

Keynote time. Telling Summit  attendees how PASS is doing financially and, more importantly to me personally, how we’re doing growing globally. I’m pretty passionate about this message, and based on the tweets I saw later, I think it came across.

Oh, but before I gave my first ever keynote, I got to hang out in the Green Room (trust me, not exciting) with Rimma Nehme and Dr. David Dewitt (trust me, WAY exciting). Yeah, I was totally cool, and not a fan-boy at all… mostly.

Yeah, and I had to wear makeup because I was on TV in front of the giant screens. Not my favorite part of the experience.

Time for more meetings. I met with the finance team to discuss how we’re doing on the audit (great by the way) and lay down some plans for the coming year. I’m very excited with what they’ve done around finance, including putting together some details around the cost of SQLSaturday events. We’re doing more like this so that we can help all the various portfolios plan better for next year.

I went back to the Redgate booth to work some more. I also spent time in the Community Zone, just there to answer any questions that anyone had. I took a few notes from people, all that’s going to help us make good decisions going forward.

Redgate paid for one of the new sponsored sessions. We talked about SQL Clone and had a full room of people who were very interested.

Thursday night this year had no official event.  This enabled more community events to be planned and also allowed for less formal community gatherings to take place. From what I saw, this was a wild success. Game Night was created, and well attended. Various other community events sprang up. Vendors also had a night where they weren’t competing directly with PASS. In fact, I went to the Redgate Rocks event (we held a SQL Clone launch party in this wonderful space, good times).

Friday

Finally , it was Friday after a long, demanding, and rewarding week. Believe it or not, I only had two things that I had to do for PASS on Friday. First, I had an interview with Tony Davis for a vblog. Then we had the board Q&A in the afternoon.

The Board Q&A was held this year in the Community Zone. It let people wander in and out at will. I think it worked great. We had more and more people with us over time. It was a lot more than we see when the event is held in one of the smaller meeting rooms. It was a little hard to hear sometimes, but except for that, it really was fantastic.

After that, I had one personal mission. I’m pretty sure I didn’t succeed, but I tried to make a point of thanking each and every member of PASS HQ for the hard work they had done this week, putting on the…

BEST

SUMMIT

EVER!

Friday night I went into hiding and had a quiet dinner with a few friends (which means I excluded other friends, for which I’m truly sorry).

Conclusion

Sessions? What sessions? Other than the ones I put on, I was only able to see part of two of the Speaker Idol sessions (great stuff). Other than that, it was all PASS, or all Redgate, all the time.

Being on the board does change the Summit experience. It’s not worse. It’s different. In some ways, it’s better. I’m truly appreciative that you’ve all given me the opportunity to take part in putting something like this together. We’re now going to start ramping up for next year. If you have feedback, please, get it to us.

Nov 07 2016

sp_executesql Is Not Faster Than an Ad Hoc Query

This requires an immediate caveat. You should absolutely be using sp_executesql over any type of non-parameterized execution of T-SQL. You must parameterize your T-SQL because the lack of parameters in building up and executing strings is a classic SQL Injection attack vector. Using straight ad hoc T-SQL is an extremely poor coding choice because of SQL Injection, not because there is something that makes one method faster than the other.

Yet, I see in performance checklists that you should be using sp_executesql over straight ad hoc T-SQL because it will perform faster. That statement is incorrect.

Some Discussion

Let me reiterate the caveat before we continue. I 100% advocate for the use of sp_executesql. This function is preferred over ad hoc SQL because, used properly (and isn’t that usually one of the main problems, always), you can both build an ad hoc query and use parameters in order to avoid SQL Injection. The security implications of SQL Injection are kind of hard to over-emphasize. SQL Injection has been a primary vector for hacking for close on to twenty years now. We know the best way to avoid it is to use parameterized queries with data validation around the parameters. Why this is continually ignored is hard for me to understand.

However, despite the importance of using sp_executesql, I’m not advocating for it’s use as a performance improvement mechanism. I’m unclear as to how this comes to be on a performance checklist, with no discussion of taking advantage of Parameter Sniffing and/or plan reuse (possible performance advantages). I can only assume this is yet another example of Cargo Cult Programming. People know that they are supposed to use sp_executesql (and yes, you are supposed to use it), but don’t really understand why, so they start guessing.

The tests are going to run primarily from T-SQL in order to compare a straight EXECUTE of a query string to sp_executesql. However, for the sake of protecting against SQL Injection, let me also mention that calling to your database strictly through code, you can use two approaches (well, several, but we’ll focus on two in order to keep this blog post to a minimal size, I can’t caveat and explain every single possible permutation of all possible database access methods while still making anything approaching a coherent point), building up ad hoc T-SQL and executing that against the server directly, or, using a mechanism to parameterize your queries. You absolutely should be using the parameterized methods in order to validate your input and avoid SQL Injection.

The Simplest Test

Let’s start with a very simple, and simplified, query in order to illustrate the point:

DECLARE @adhocquery NVARCHAR(max) 
SET @adhocquery = N'SELECT si.StockItemName,
   i.InvoiceDate,
   il.Description
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il
   ON il.InvoiceID = i.InvoiceID
JOIN Warehouse.StockItems AS si
ON si.StockItemID = il.StockItemID;'

EXEC (@adhocquery);


DECLARE @sqlquery NVARCHAR(max) 
SET @sqlquery = N'SELECT si.StockItemName,
   i.InvoiceDate,
   il.Description
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il
   ON il.InvoiceID = i.InvoiceID
JOIN Warehouse.StockItems AS si
ON si.StockItemID = il.StockItemID;'

EXEC sys.sp_executesql @stmt = @sqlquery;

That’s the same query executed using the two methods in question. The results are an identical execution plan and exactly the same number of reads. If I execute either of them thousands of times then the execution times don’t vary. They have matching query hash and plan hash values. These are identical queries in every possible way. Even if I compare the performance across thousands of executions and include the compile time there is no difference in the outcome.

At the simplest possible level, these are identical mechanisms for executing a basic query. The only performance difference comes about because of parameters.

Test With Parameters

Instead of just running the query over and over again, I really want to test actual, meaningful, behavior this time. I’m going to load all the values for the BillToCustomerID column of the Invoices table using a PowerShell script. Then, I’ll execute the queries once for each of these values, using the two different execution methods.

To make aggregating the results easier, I put each query into a procedure:

CREATE PROCEDURE dbo.AdHoc (@BillToCustomerID INT)
AS
   DECLARE @Query NVARCHAR(MAX);

   SET @Query
      = N'SELECT si.StockItemName,
   i.InvoiceDate,
   il.Description
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il
   ON il.InvoiceID = i.InvoiceID
JOIN Warehouse.StockItems AS si
   ON si.StockItemID = il.StockItemID
WHERE i.BillToCustomerID = ' + CAST(@BillToCustomerID AS NVARCHAR(10)) + ';';

   EXEC (@Query);
GO


CREATE PROCEDURE dbo.ExecSQL (@BillToCustomerID INT)
AS
   DECLARE @sqlquery NVARCHAR(MAX);

   SET @sqlquery
      = N'SELECT si.StockItemName,
   i.InvoiceDate,
   i.SalespersonPersonID
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il
   ON il.InvoiceID = i.InvoiceID
JOIN Warehouse.StockItems AS si
   ON si.StockItemID = il.StockItemID
WHERE i.BillToCustomerID = @BillToCustomerID;';

   DECLARE @parms NVARCHAR(MAX);

   SET @parms = '@BillToCustomerID int';

   EXEC sys.sp_executesql @stmt = @sqlquery,
      @params = @parms,
      @BillToCustomerID = @BillToCustomerID;
GO

The results are fun.

Execution Type Average Duration
sp_executesql AVG: 57946.03187251
Ad Hoc AVG: 14788.8924302789

What’s going on? Is the conclusion that, in fact, ad hoc queries are faster than sp_executesql?

Absolutely not.

I cheated.

I intentionally picked a data set with a pretty interesting distribution. Depending on the value passed for BillToCustomerID there is the possibility of one of three different execution plans:

sp_executesql

In fact, the data is such that the first value that would be called is going to generate the worst possible plan for all the other data sets because it leads to the plan that simply consists of three scans. Even if I choose to force one of the other plans first, something I did several times while testing, the fact that the ad hoc queries will always generate the best plan for the data set results in better overall performance for ad hoc, in this instance.

Please don’t mistake me. I could skew the data in another direction in order to make sp_executesql into the better performing mechanism. The initial premise was that you should use sp_executesql over ad hoc because it will be faster. That’s not the case. In fact, it completely depends on a number of factors as to which of these methods will be faster. That said, my preferred mechanism is to use sp_executesql because it creates parameterized queries where I can ensure, with a certainty, that I’m avoiding SQL Injection. To achieve parity on execution times, I could simply include the WITH RECOMPILE hint and then I would have the same speed as the ad hoc approach while still ensuring my security.

Oh, and to add another wrinkle, you could always turn on ‘Optimize For Ad Hoc’. That shaves a few more milliseconds off the ad hoc approach over the sp_executesql approach in this example.

Conclusion

I know I’ve over-emphasized this throughout this discussion, but I’m going to repeat it again, SQL Injection is dangerous and a purely ad hoc approach to queries leads to unsafe servers. You must validate your inputs and use a querying mechanism that ensures that SQL Injection won’t be an issue. This is accomplished by using parameterized queries, which includes sp_executesql.

However, should you be using sp_executesql over ad hoc queries because performance will improve? No. Clearly that’s not the case. Instead you should be using sp_executesql because it’s a safer, saner approach to writing your queries.


I love talking performance tuning. In fact, I’ll be doing an all day seminar on execution plans and query tuning before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.

Nov 03 2016

Reinforcing the Importance of Statistics on Row Estimate

I recently wrote an introductory post about the importance of statistics. I just received a reinforcement on how important they are during my own work.

Bad Estimate

I hit a weird problem while I was setting up a query to illustrate a point (blog to be published next week). Let’s take the basis of the problem and explain it. I wanted data with distribution skew, so I ran this query to find out if there was a wide disparity between the top and bottom of the range:

SELECT i.BillToCustomerID,
   COUNT(i.BillToCustomerID) AS TestCount
FROM Sales.Invoices AS i
GROUP BY i.BillToCustomerID
ORDER BY TestCount ASC;

Sure enough, the bottom of the range returned three (3) rows and the top returned 21,551. If I then run a query to retrieve just a few rows like this:

SELECT *
FROM Sales.Invoices AS i
WHERE i.BillToCustomerID = 1048;

I get the following execution plan:

SimplePlan

I’m happy because this is the plan I expected. With this plan in hand, I don’t bother looking at anything else.

Creating a Problem

I expand out the query initially as follows:

SELECT i.InvoiceID,
   il.InvoiceLineID,
   si.StockItemName
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il
   ON il.InvoiceID = i.InvoiceID
JOIN Warehouse.StockItems AS si
ON si.StockItemID = il.StockItemID
WHERE i.BillToCustomerID = 1048;

The execution plan now looks like this:

complexplan

Frankly, I’m puzzled. Why on earth did we go from a key lookup operation to a scan on the Invoices table? I rebuild the query a couple of times and it keeps going to a scan. Finally, I pause a moment and look at the row estimate (you know, like I should have done the first moment I was puzzled):

row estimate

258 rows? Wait, that’s wrong. The number of rows for this value is three. Why on earth would it be showing 258? There’s no reason. I haven’t done any kinds of calculations on the columns. I double check the structures. No hidden views or constraints, or anything that would explain why the estimate was so wrong. However, it’s clear that the estimate of 258.181 is causing the loops join and key lookup to go away in favor of a hash join and scan when I add complexity to the row estimate needed by the optimizer.

After thinking about it a while, I finally ran DBCC SHOW_STATISTICS:

statistics

Note the highest point on the histogram, 1047. Yet I’m passing in 1048.

So, what’s happening?

While the number of rows for 1048 was the lowest, at 3, unfortunately it seems that the 1048 values were added to the table after the statistics for the index had been updated. Instead of using something from the histogram, my value fell outside the values in the histogram. When the value is outside histogram the Cardinality Estimator uses the average value across the entire histogram, 258.181 (at least for any database that’s in SQL Server 2014 or greater and not running in a compatibility mode), as the row estimate.

I then change the query to use the value 1047, the execution plan then changed to look like this:

cleanplan

The new plan reflects the behavior I was going for when I was setting up the test. The row estimates are now accurate, and small, therefore I get a key lookup operation instead of a scan.

Conclusion

Statistics drive the decisions made by the optimizer. The very first moment you’re looking at an execution plan and you’re seeing a scan where you thought, for sure, you should have seen a seek, check the row estimates (OK, not the first moment, it could be a coding issue, structural issue, etc.). It could be that your statistics are off. I just received my own reminder to pay more attention to the row estimates and the statistics.


I love playing with statistics and execution plans and queries. As a result, I also like teaching how to do this stuff. If you’re interested, I’m putting on a class in Rhode Island, December 2016. Sign up here.

Nov 01 2016

Stored Procedures Are Not Faster Than Views

A performance tuning tip I saw recently said, “Views don’t perform as well as stored procedures.”

<sigh>

Let’s break this down, just a little.

Definitions

A view is nothing but a query. The definition given by Microsoft is that it’s a virtual table that’s defined by a query. It’s a query that is used to mask data or perform a complex join or similar behaviors. Views are queries that get stored in the database. Views can be easily referred to as if they were a tables. That’s it. I’ve written in the past about views, including how they can possibly perform poorly.

A stored procedure is also a query, or a series of queries, or, a whole lot more. Microsoft’s definition of a stored procedure basically defines it as programming object that can accept input through parameters, perform actions, and provide various types of output. Stored procedures are also stored in the database, but that’s about the end of the direct correlations to a view. Heck, you can call views from stored procedures, so I’m really at a loss as to where this tip comes from.

In short, yes, both these objects have in them queries, but these are fundamentally different objects. You can’t really say that using one or the other is faster because they each do different things. Further, you can write code that will perform poorly using either construct.

Test Setup To Compare Performance

Here’s the view definition I’m going to use for the tests:

CREATE VIEW dbo.CustomerDeliveryInfo
AS
SELECT c.CustomerName,
   c.DeliveryRun,
   c.RunPosition,
   dm.DeliveryMethodName,
   cid.CityName AS DeliveryCity,
   cip.CityName AS PostalCity,
   c.CustomerID
FROM Sales.Customers AS c
JOIN Application.DeliveryMethods AS dm
   ON dm.DeliveryMethodID = c.DeliveryMethodID
JOIN Application.Cities AS cid
   ON cid.CityID = c.DeliveryCityID
JOIN Application.Cities AS cip
   ON cip.CityID = c.PostalCityID;

We’re going to compare that with a stored procedure that uses the same query. The procedure also takes advantage of the fact that it is a stored procedure using a parameter for input values:

CREATE PROCEDURE dbo.CustomerDeliveryInformation 
(@CustomerID INT)
AS
BEGIN
   SELECT c.CustomerName,
      c.DeliveryRun,
      c.RunPosition,
      dm.DeliveryMethodName,
      cid.CityName AS DeliveryCity,
      cip.CityName AS PostalCity,
      c.CustomerID
   FROM Sales.Customers AS c
   JOIN Application.DeliveryMethods AS dm
      ON dm.DeliveryMethodID = c.DeliveryMethodID
   JOIN Application.Cities AS cid
      ON cid.CityID = c.DeliveryCityID
   JOIN Application.Cities AS cip
      ON cip.CityID = c.PostalCityID
   WHERE c.CustomerID = @CustomerID;
END;

We’re also going to create another stored procedure that uses the view:

CREATE PROCEDURE dbo.InfoCustomerDelivery 
(@CustomerID INT)
AS
BEGIN
   SELECT * FROM dbo.CustomerDeliveryInfo AS cdi
   WHERE cdi.CustomerID = @CustomerID;
END;

Because stored procedures and views are different, we’ll have to call these different objects in different ways:

SELECT *
FROM dbo.CustomerDeliveryInfo AS cdi
WHERE cdi.CustomerID = 556;

EXEC dbo.CustomerDeliveryInformation @CustomerID = 556;

EXEC dbo.InfoCustomerDelivery @CustomerID = 556;

In this way we can run each of these queries independently and compare the results between them.

Results Comparing Stored Procedures With Views

If you run each of the queries above, you will find that they all create a nearly identical execution plan:

Views exec plan

You can click on that to make it bigger. If we compare all the different plans, one set of details does stand out:

plandifferences

There is a difference in the compile time between the view by itself and the stored procedures (they were almost identical). Let’s look at performance over a few thousand executions:

Query duration
View AVG: 210.431431431431
Stored Proc w/ View AVG: 190.641641641642
Stored Proc AVG: 200.171171171171

This is measured in microsends, so the variation we’re seeing is likely just some disparity on I/O, CPU or something else since the differences are trivial at 10mc or 5%. While that may seem like the view is suffering, please note that the view inside the procedure actually ran faster by 5%. Again, this is explained by the fact that we’re only talking about a 10 microsecond difference. I’m not sure if that’s within the margin for error on the Extended Event sql_batch_complete or not (I couldn’t find documentation stating what it might be), but I’ll bet it’s close. I believe it’s safe to say that the average performance of these queries is identical.

All three queries had 8 logical reads.

What about execution time including compile time, since there is a difference:

Query duration
View AVG: 10089.3226452906
Stored Proc AVG: 9314.38877755511
Stored Proc w/ View AVG: 9938.05410821643

The difference in the performance including compile time for the procedure alone is 700mc better on average than the view. That’s an 8% difference. It was almost that high for the view that used the procedure at 7%.

If we’re just talking compile time then, there is a significant win if we avoid the view. This is no doubt because of the extra work involved in unpacking the view and going through the simplification process within the optimizer. Plus, the view alone in our query was parameterized by the optimizer in order to assist it’s performance over time (as we saw in the average results without the recompile). All that extra work explains the 8% difference.

Let’s Break It

What if we change the query around a little. I decide that all I want to see right now from the view is the CustomerID:

SELECT cdi.CustomerID
FROM dbo.CustomerDeliveryInfo AS cdi
WHERE cdi.CustomerID = 556;

When I execute this, I get a whole new execution plan:

viewsimple

The execution time drops a little to around 190mc on average and the reads go from 8 to 2. The stored procedure would have to get rewritten to only return CustomerID. Does that mean that views are faster than stored procs? Absolutely not. It just means that there is some degree of flexibility built into the view, as a construct, that’s not there in a stored procedure, as a construct. These are fundamentally different objects.

What if we change the query against the view again:

SELECT *
FROM dbo.CustomerDeliveryInfo AS cdi
WHERE cdi.CustomerName = 'Om Yadav';

Once more the execution plan will change to something different than before:

viewcomplex

Performance drops to about 300mc and we get 10 reads instead of 8. Does that mean that views are slower than stored procedures? No. We’re attempting to compare two different objects that perform two different functions within SQL Server.

Conclusion

Since a stored procedure can actually query a view, suggesting that we use stored procedures instead of views becomes quite problematic. With the exception of the differences in compile time, we see that views actually perform exactly the same as stored procedures, if the query in question is the same. There are reasons to use views as well as reasons to not use them. There are reasons to use stored procedures as well as reasons to not use them. Neither of these objects is preferred above the other because of performance concerns.


Want to play some more with execution plans and query tuning? I’ll be doing an all day seminar on execution plans and query tuning before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.

Oct 24 2016

A Sub-Query Does Not Hurt Performance

The things you read on the internet, for example, “don’t use a sub-query because that hurts performance.”

Truly?

Where do people get these things?

Let’s Test It

I’ve written before about the concept of cargo cult data professionals. They see one issue, one time, and consequently extrapolate that to all issues, all the time. It’s the best explanation I have for why someone would suggest that a sub-query is flat out wrong and will hurt performance.

Let me put a caveat up front (which I will reiterate in the conclusion, just so we’re clear), there’s nothing magically good about sub-queries just like there is nothing magically evil about sub-queries. You can absolutely write a sub-query that performs horribly, does horrible things, runs badly, and therefore absolutely screws up your system. Just as you can with any kind of query. I am addressing the bad advice that a sub-query is to be avoided because they will inherently lead to poor performance.

Let’s start with a simple test, just to validate the concept of how a sub-query performs within SQL Server:

SELECT sd.OrderQty,
   pr.Name
FROM
   (SELECT *
    FROM Sales.SalesOrderDetail AS sod
   ) AS sd
JOIN
   (SELECT *
    FROM Production.Product AS p
   ) AS pr
   ON pr.ProductID = sd.ProductID
WHERE sd.SalesOrderID = 52777;

SELECT sod.OrderQty,
   p.Name
FROM Sales.SalesOrderDetail AS sod
JOIN Production.Product AS p
   ON p.ProductID = sod.ProductID
WHERE sod.SalesOrderID = 52777;

If there is something inherently wrong with a sub-query, then there is something twice as wrong with two sub-queries. Here are the resulting execution plans:

sub-query plan matches query plan

Huh, look sort of, I don’t know, almost identical. Let’s compare the plans using the new SSMS plan comparison utility:

only slight diffences in sub-query plan

Well, darn. Displayed in pink are the common sets of operations between the two plans. In other words, for these plans, everything except the properties of the SELECT operator are exactly the same. Let’s take a look at those properties:

sub-query SELECT properties

OK. Now we have some interesting differences, and especially, some interesting similarities. Let’s start with the similarities. First of all, we have exactly the same QueryPlanHash value in both plans. In addition, we also have identical estimated rows and costs. In short, the optimizer created two identical execution plans. Now, this is where things get a little bit interesting. See, the optimizer actually worked a little harder to create the first plan than the second. It took an extra tic on the CPU and just a little more CompileMemory and CompileTime. Interesting.

What about execution times? With a few runs on average, the execution times were identical at about 149mc with 11 reads. However, running a query once or twice isn’t testing. Let’s get a few thousand runs of both queries. The average results from the Extended Events sql_batch_completed event were 75.9 microseconds for both queries.

However, what about that extra little bit of compile time in the query that used sub-queries? Let’s add in a statement to free the procedure cache on each run and retry the queries. There is a measurable difference now:

Query duration
Sub-query AVG: 5790.20864172835
Query AVG: 4539.49289857972

More work is done by the optimizer on the sub-query to compile the same execution plan. We’re adding work to the optimizer, requiring it to unpack the, admittedly, silly query written above.  When we refer only to the compile time and not the execution time, there is a performance hit. Once the query is compiled, the performance is identical. Whether or not you get a performance hit from a sub-query then, in part, depends on the degree to which you’re experiencing compiles or recompiles. Without the recompile, there is no performance hit. At least in this example.

Let’s Test It Again, Harder

I firmly believe in the old adage; if you ain’t cheatin’, you ain’t fightin’. It’s time to put the boot in.

Let’s go with much more interesting queries that are more likely to be written than the silly example above. Let’s assume some versioned data like in this article on Simple-Talk. We could express a query to bring back a single version of one of the documents in one of three ways from the article. We’re just going to mess with two of them. One that uses a sub-query, and one that does not:

--no sub-query
SELECT TOP 1 d.DocumentName,
   d.DocumentID,
   v.VersionDescription,
   v.VersionID,
   ROW_NUMBER() OVER (ORDER BY v.VersionID DESC) AS RowNum
FROM dbo.Document d
JOIN dbo.Version v
   ON d.DocumentID = v.DocumentID
WHERE d.DocumentID = 9729;

--sub-query
SELECT  d.[DocumentName],
        d.[DocumentId],
        v.[VersionDescription],
        v.[VersionId]
FROM    dbo.[Document] d
        CROSS APPLY (SELECT TOP (1)
                            v2.VersionId,
                            v2.VersionDescription
                     FROM   dbo.[Version] v2
                     WHERE  v2.DocumentId = d.DocumentId
                     ORDER BY v2.DocumentId,
                            v2.VersionId DESC
                    ) v
WHERE   d.[DocumentId] = 9729;

As per usual, we can run these once and compare results, but that’s not really meaningful. We’ll run them thousands of times. Also, to be sure we’re comparing apples to apples, we’ll force a recompile on every run, just like in the first set of tests. The results this time:

Query duration
Sub-query AVG: 1852.14114114114
Query AVG: 2022.62162162162

You’ll note that, even with the compile on each execution, the query using a sub-query actually out-performed the query that was not using a sub-query. The results are even more dramatic when we take away the compile time:

Query duration
Sub-query AVG: 50.8368368368368
Query AVG: 63.3103103103103

We can also look to the execution plans to get an understanding of how these queries are being resolved:

differentplans

The plan on top is the sub-query plan, and the plan on the bottom is the plan for just the plain query. You can see that the regular query is doing a lot more work to arrive at an identical set of data. The differences are visible in the average execution time, about a 20% improvement.

You could argue that we’re comparing two completely different queries, but that’s not true. Both queries return exactly the same result set. It just so happens that the query using the sub-query performs better overall in this instance. In short, there’s no reason to be scared of using a sub-query.

Sub-Query Conclusion

Is it possible for you to write horrid code inside of a sub-query that seriously negatively impacts performance? Yes. Absolutely. I’m not arguing that you can’t screw up your system with poor coding practices. You absolutely can. The query optimization process within SQL Server deals well with common coding practices. Therefore, the queries you write can be fairly sophisticated before, by nature of that sophistication, you begin to get serious performance degradation.

You need to have a method of validation for some of what you read on the internet. People should provide both the queries they are testing with and the numbers that their tests showed. If you’re just seeing completely unsupported, wildly egregious statements, they’re probably not true.

In conclusion, it’s safe to use sub-queries. Just be careful with them.


If you’re finding any of this useful and you’d like to dig down a little more, you can, because I’ll be putting on an all day seminar on execution plans and query tuning. The event takes place before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.

Oct 18 2016

SQL Cruise as a Speaker and Sponsor

Sponsor SQL CruiseSQL Cruise offers a number of unique opportunities for everyone involved, including speakers and sponsors. I’ve written before, several times, about the benefits of SQL Cruise. As an attendee of the cruise, you will get to sit in classes by amazing people (and me) talking about all aspects of the Data Platform. Further, you get the opportunity to sit down, for long hours, with these people and get, for want of a better description, free consulting time. What about as a speaker and a sponsor though, do we get benefits?

As A Speaker

Tim Ford is quite a bright fellow. He has tweaked and tuned SQL Cruise. I have been on SQL Cruise a number of times over the last five years, and it has changed radically since my first cruise in 2011. He keeps getting the format better and better, trimming away a little fat here, adding a little muscle there. All this results in a better SQL Cruise experience which, as a speaker, gives me three things; engagement, time, and follow-up. Let’s talk about these.

Engagement

The people who attend SQL Cruise run from almost complete amateurs within the Data Platform to MCMs who know more than I do about, well, everything. All these people are at SQL Cruise to take part in SQL Cruise. They’ve read the brochures, the web site, and testimonials from people like me. They know that this is an opportunity to get their learn on in a way they won’t get elsewhere. This means those people show up demanding the best out of you. They want you to deliver information that will be useful to them, and they’re going to pay attention to what you say. They ask questions. They engage and they engage directly and well.

As a speaker, this engagement is exciting. It means I have to prepare, properly and thoroughly. I have to pick my topics well (and oh boy, do Tim and I go round & round on that). The engagement means I’m going to have a good time presenting. Don’t believe me? Present a session to your wall. That’s a complete lack of engagement. Was it fun? Now go and present on SQL Cruise where everyone is hanging on your every word. It’s a blast.

Further, the engagement does mean that I have to prep well to present to these people. That means I’m learning the topics I’m going to present on even better. I’ve heard it said, and I believe it, if you really want to know a topic well, teach it. If you want to know a topic more than well, teach it on SQL Cruise.

Time

The class-time on a SQL Cruise is currently about two hours per session. Pick a topic like the Query Store and try to present a substantial chunk of it in your standard one-hour time slot. It’s going to be either highly detailed on only a sub-set of the topic, or it’s going to be very superficial. Two hours gives me enough time as a speaker to delve into the topic and explore it. With an engaged audience, I get lots of questions, and still have time to do the entire presentation. At no point am I feeling rushed. I’m not forced to dump lots of things until later. I have the time I need to get the point across that I hope to teach. However, I have even more time because of the follow-up.

Follow-Up

SQL Cruise does not simply consist of class time and then bar time. There are also the Office Hours. This is dedicated time to let the attendees and the speakers mingle. You can sit down, yes, with a drink in hand if you’re so inclined, with the people you’ve just been talking to for two hours and have a more in depth conversation. It’s not simply standing in front and dictating to people. These are your peers. Some of them may become your friends. You get to drill down on topics and connect these topics to other topics. With the topics well established you have the opportunity to tie it all back in to people’s every-day jobs. The follow-up turns a typical teaching experience into a much more intimate affair where you can really share in ways you just can’t at an event with 400 people in attendance.

As A Sponsor

Why on earth would Redgate spend money sponsoring people going on a cruise? After all, we’re only going to talk to 30 people. Couldn’t we invest the same amount of money in an event with 300 people and get ten times the impact? If only sponsorship in events worked that way. Vendors would simply fight with one another to support only the largest events because if we’re at an event with 300/3000/300000 people, we automatically get that many eyeballs, right? Nope. It doesn’t work that way at all. SQL Cruise is worth investing in because we get the eyeballs of the right people who will amplify our message.

Eyeballs

Working the booth at events for Redgate Software, I’ll talk to a ton of people. At a well structured event, all the attendees will have walked past my booth several times. All that attention probably results in a good engagement with, maybe 10-15% of the event. Sometimes more, sometimes less. I’m not sharing the hard numbers (and yes, we count). That number is a subset, but not at SQL Cruise. At SQL Cruise, as a sponsor, we get all the eyeballs. They’re ours, not just for 10 minutes, but for several days. No, if you’re on the cruise it’s not some long advertisement, but you’re aware of the sponsors. You know who they are. If they have a representative there, the attendees, each and every one, talks to them. It’s 100%.

The Right People

Even more important than the fact that we’re getting 100% of the people to hear whatever message we care to share is the fact that we’re sharing with the right people. The people on SQL Cruise are the ones who can convince their boss, a little against common sense, that not only will going on a cruise be good for the individual, it will be good for the company. You think I don’t want to show these people how we can help them solve problems? Yes I do. Sometimes the person on SQL Cruise is the boss. Do I want to talk to them? Yes. Every so often, the person on the cruise is junior, or just getting going. Do I still want to talk to them? Yes, because these people, along with the others provide amplification. As a sponsor, amplification is everything.

Amplification

Very few people come out of SQL Cruise that are not impassioned with what they’ve learned. They’re rested, recharged, and filled with new information. They let the world know. Further, many people come out of SQL Cruise and change their careers. I know a large number of speakers, bloggers, writers, all of whom got their start on the cruise. They share. They share a lot. One of the things they share is the messages of the sponsor on the cruise. Suddenly, instead of the 30 eyeballs that I reached on the cruise, I’m reaching a lot more. That’s because the people on the cruise become very active, very vocal, and very supportive of the sponsor that helped provide them with a life-changing experience. This provides a huge impact for us as a sponsor.

Ready To Cruise Again

I’m taking part in the next cruise in January in the Caribbean. I’ve already worked out with Tim the topics I’ll be covering. I’m prepping the material now because it does take that long to get things ready for SQL Cruise. I’m lobbying work to get on the Alaska cruise later in the year. In short, I’m ready to go on SQL Cruise again. I’d sure like the opportunity to share the experience with you. Go here to sign up.

Oct 17 2016

SELECT * Does Not Hurt Performance

SELECT *I read all the time how SELECT * hurts performance. I even see where people have said that you just have to supply a column list instead of SELECT * to get a performance improvement. Let’s test it, because I think this is bunkum.

The Test

I have here two queries:

SELECT *
FROM Warehouse.StockItemTransactions AS sit;

--and

SELECT sit.StockItemTransactionID,
       sit.StockItemID,
       sit.TransactionTypeID,
       sit.CustomerID,
       sit.InvoiceID,
       sit.SupplierID,
       sit.PurchaseOrderID,
       sit.TransactionOccurredWhen,
       sit.Quantity,
       sit.LastEditedBy,
       sit.LastEditedWhen
FROM Warehouse.StockItemTransactions AS sit;

I’m basically going to run this a few hundred times each from PowerShell. I’ll capture the executions using Extended Events and we’ll aggregate the results.

The Results

I ran the test multiple times because, funny enough, I kept seeing some disparity in the results. One test would show a clear bias for one method, another test would show the opposite. However, averaging the averages we see that things broke down as follows:

* 167.247ms
Column List 165.500ms

That’s after about 2000 separate executions of each query. There’s a 2ms bias towards the Column List query as opposed to the *. That’s an improvement, if you want to call it that, of 1%. It’s hardly worth the bother, assuming that with more testing this continued to hold true. In multiple tests, the SELECT * ran faster. I just feel honor bound to put up the full results. They show an improvement, but not one I’d get excited about. Oh, and the reads, the execution plan, everything else… identical.

SELECT * Conclusion

Don’t get me wrong, there are lots of reasons to not use SELECT *. Yes, performance is one of the reasons to not use SELECT *. However, when most people suggest that maybe using SELECT * is a bad idea for performance reasons, what they’re saying is you ought to only move the columns you need and the data you are actually using, not everything. I’m not aware of anyone with experience and knowledge suggesting that using the complete column list instead of SELECT * is faster. As we can see in the tests above, it isn’t (or is by so small a margin, who cares).


I love talking performance tuning. In fact, I’ll be doing an all day seminar on execution plans and query tuning before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.