Oct 24 2016

A Sub-Query Does Not Hurt Performance

The things you read on the internet, for example, “don’t use a sub-query because that hurts performance.”

Truly?

Where do people get these things?

Let’s Test It

I’ve written before about the concept of cargo cult data professionals. They see one issue, one time, and consequently extrapolate that to all issues, all the time. It’s the best explanation I have for why someone would suggest that a sub-query is flat out wrong and will hurt performance.

Let me put a caveat up front (which I will reiterate in the conclusion, just so we’re clear), there’s nothing magically good about sub-queries just like there is nothing magically evil about sub-queries. You can absolutely write a sub-query that performs horribly, does horrible things, runs badly, and therefore absolutely screws up your system. Just as you can with any kind of query. I am addressing the bad advice that a sub-query is to be avoided because they will inherently lead to poor performance.

Let’s start with a simple test, just to validate the concept of how a sub-query performs within SQL Server:

SELECT sd.OrderQty,
   pr.Name
FROM
   (SELECT *
    FROM Sales.SalesOrderDetail AS sod
   ) AS sd
JOIN
   (SELECT *
    FROM Production.Product AS p
   ) AS pr
   ON pr.ProductID = sd.ProductID
WHERE sd.SalesOrderID = 52777;

SELECT sod.OrderQty,
   p.Name
FROM Sales.SalesOrderDetail AS sod
JOIN Production.Product AS p
   ON p.ProductID = sod.ProductID
WHERE sod.SalesOrderID = 52777;

If there is something inherently wrong with a sub-query, then there is something twice as wrong with two sub-queries. Here are the resulting execution plans:

sub-query plan matches query plan

Huh, look sort of, I don’t know, almost identical. Let’s compare the plans using the new SSMS plan comparison utility:

only slight diffences in sub-query plan

Well, darn. Displayed in pink are the common sets of operations between the two plans. In other words, for these plans, everything except the properties of the SELECT operator are exactly the same. Let’s take a look at those properties:

sub-query SELECT properties

OK. Now we have some interesting differences, and especially, some interesting similarities. Let’s start with the similarities. First of all, we have exactly the same QueryPlanHash value in both plans. In addition, we also have identical estimated rows and costs. In short, the optimizer created two identical execution plans. Now, this is where things get a little bit interesting. See, the optimizer actually worked a little harder to create the first plan than the second. It took an extra tic on the CPU and just a little more CompileMemory and CompileTime. Interesting.

What about execution times? With a few runs on average, the execution times were identical at about 149mc with 11 reads. However, running a query once or twice isn’t testing. Let’s get a few thousand runs of both queries. The average results from the Extended Events sql_batch_completed event were 75.9 microseconds for both queries.

However, what about that extra little bit of compile time in the query that used sub-queries? Let’s add in a statement to free the procedure cache on each run and retry the queries. There is a measurable difference now:

Query duration
Sub-query AVG: 5790.20864172835
Query AVG: 4539.49289857972

More work is done by the optimizer on the sub-query to compile the same execution plan. We’re adding work to the optimizer, requiring it to unpack the, admittedly, silly query written above.  When we refer only to the compile time and not the execution time, there is a performance hit. Once the query is compiled, the performance is identical. Whether or not you get a performance hit from a sub-query then, in part, depends on the degree to which you’re experiencing compiles or recompiles. Without the recompile, there is no performance hit. At least in this example.

Let’s Test It Again, Harder

I firmly believe in the old adage; if you ain’t cheatin’, you ain’t fightin’. It’s time to put the boot in.

Let’s go with much more interesting queries that are more likely to be written than the silly example above. Let’s assume some versioned data like in this article on Simple-Talk. We could express a query to bring back a single version of one of the documents in one of three ways from the article. We’re just going to mess with two of them. One that uses a sub-query, and one that does not:

--no sub-query
SELECT TOP 1 d.DocumentName,
   d.DocumentID,
   v.VersionDescription,
   v.VersionID,
   ROW_NUMBER() OVER (ORDER BY v.VersionID DESC) AS RowNum
FROM dbo.Document d
JOIN dbo.Version v
   ON d.DocumentID = v.DocumentID
WHERE d.DocumentID = 9729;

--sub-query
SELECT  d.[DocumentName],
        d.[DocumentId],
        v.[VersionDescription],
        v.[VersionId]
FROM    dbo.[Document] d
        CROSS APPLY (SELECT TOP (1)
                            v2.VersionId,
                            v2.VersionDescription
                     FROM   dbo.[Version] v2
                     WHERE  v2.DocumentId = d.DocumentId
                     ORDER BY v2.DocumentId,
                            v2.VersionId DESC
                    ) v
WHERE   d.[DocumentId] = 9729;

As per usual, we can run these once and compare results, but that’s not really meaningful. We’ll run them thousands of times. Also, to be sure we’re comparing apples to apples, we’ll force a recompile on every run, just like in the first set of tests. The results this time:

Query duration
Sub-query AVG: 1852.14114114114
Query AVG: 2022.62162162162

You’ll note that, even with the compile on each execution, the query using a sub-query actually out-performed the query that was not using a sub-query. The results are even more dramatic when we take away the compile time:

Query duration
Sub-query AVG: 50.8368368368368
Query AVG: 63.3103103103103

We can also look to the execution plans to get an understanding of how these queries are being resolved:

differentplans

The plan on top is the sub-query plan, and the plan on the bottom is the plan for just the plain query. You can see that the regular query is doing a lot more work to arrive at an identical set of data. The differences are visible in the average execution time, about a 20% improvement.

You could argue that we’re comparing two completely different queries, but that’s not true. Both queries return exactly the same result set. It just so happens that the query using the sub-query performs better overall in this instance. In short, there’s no reason to be scared of using a sub-query.

Sub-Query Conclusion

Is it possible for you to write horrid code inside of a sub-query that seriously negatively impacts performance? Yes. Absolutely. I’m not arguing that you can’t screw up your system with poor coding practices. You absolutely can. The query optimization process within SQL Server deals well with common coding practices. Therefore, the queries you write can be fairly sophisticated before, by nature of that sophistication, you begin to get serious performance degradation.

You need to have a method of validation for some of what you read on the internet. People should provide both the queries they are testing with and the numbers that their tests showed. If you’re just seeing completely unsupported, wildly egregious statements, they’re probably not true.

In conclusion, it’s safe to use sub-queries. Just be careful with them.


If you’re finding any of this useful and you’d like to dig down a little more, you can, because I’ll be putting on an all day seminar on execution plans and query tuning. The event takes place before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.

Oct 18 2016

SQL Cruise as a Speaker and Sponsor

Sponsor SQL CruiseSQL Cruise offers a number of unique opportunities for everyone involved, including speakers and sponsors. I’ve written before, several times, about the benefits of SQL Cruise. As an attendee of the cruise, you will get to sit in classes by amazing people (and me) talking about all aspects of the Data Platform. Further, you get the opportunity to sit down, for long hours, with these people and get, for want of a better description, free consulting time. What about as a speaker and a sponsor though, do we get benefits?

As A Speaker

Tim Ford is quite a bright fellow. He has tweaked and tuned SQL Cruise. I have been on SQL Cruise a number of times over the last five years, and it has changed radically since my first cruise in 2011. He keeps getting the format better and better, trimming away a little fat here, adding a little muscle there. All this results in a better SQL Cruise experience which, as a speaker, gives me three things; engagement, time, and follow-up. Let’s talk about these.

Engagement

The people who attend SQL Cruise run from almost complete amateurs within the Data Platform to MCMs who know more than I do about, well, everything. All these people are at SQL Cruise to take part in SQL Cruise. They’ve read the brochures, the web site, and testimonials from people like me. They know that this is an opportunity to get their learn on in a way they won’t get elsewhere. This means those people show up demanding the best out of you. They want you to deliver information that will be useful to them, and they’re going to pay attention to what you say. They ask questions. They engage and they engage directly and well.

As a speaker, this engagement is exciting. It means I have to prepare, properly and thoroughly. I have to pick my topics well (and oh boy, do Tim and I go round & round on that). The engagement means I’m going to have a good time presenting. Don’t believe me? Present a session to your wall. That’s a complete lack of engagement. Was it fun? Now go and present on SQL Cruise where everyone is hanging on your every word. It’s a blast.

Further, the engagement does mean that I have to prep well to present to these people. That means I’m learning the topics I’m going to present on even better. I’ve heard it said, and I believe it, if you really want to know a topic well, teach it. If you want to know a topic more than well, teach it on SQL Cruise.

Time

The class-time on a SQL Cruise is currently about two hours per session. Pick a topic like the Query Store and try to present a substantial chunk of it in your standard one-hour time slot. It’s going to be either highly detailed on only a sub-set of the topic, or it’s going to be very superficial. Two hours gives me enough time as a speaker to delve into the topic and explore it. With an engaged audience, I get lots of questions, and still have time to do the entire presentation. At no point am I feeling rushed. I’m not forced to dump lots of things until later. I have the time I need to get the point across that I hope to teach. However, I have even more time because of the follow-up.

Follow-Up

SQL Cruise does not simply consist of class time and then bar time. There are also the Office Hours. This is dedicated time to let the attendees and the speakers mingle. You can sit down, yes, with a drink in hand if you’re so inclined, with the people you’ve just been talking to for two hours and have a more in depth conversation. It’s not simply standing in front and dictating to people. These are your peers. Some of them may become your friends. You get to drill down on topics and connect these topics to other topics. With the topics well established you have the opportunity to tie it all back in to people’s every-day jobs. The follow-up turns a typical teaching experience into a much more intimate affair where you can really share in ways you just can’t at an event with 400 people in attendance.

As A Sponsor

Why on earth would Redgate spend money sponsoring people going on a cruise? After all, we’re only going to talk to 30 people. Couldn’t we invest the same amount of money in an event with 300 people and get ten times the impact? If only sponsorship in events worked that way. Vendors would simply fight with one another to support only the largest events because if we’re at an event with 300/3000/300000 people, we automatically get that many eyeballs, right? Nope. It doesn’t work that way at all. SQL Cruise is worth investing in because we get the eyeballs of the right people who will amplify our message.

Eyeballs

Working the booth at events for Redgate Software, I’ll talk to a ton of people. At a well structured event, all the attendees will have walked past my booth several times. All that attention probably results in a good engagement with, maybe 10-15% of the event. Sometimes more, sometimes less. I’m not sharing the hard numbers (and yes, we count). That number is a subset, but not at SQL Cruise. At SQL Cruise, as a sponsor, we get all the eyeballs. They’re ours, not just for 10 minutes, but for several days. No, if you’re on the cruise it’s not some long advertisement, but you’re aware of the sponsors. You know who they are. If they have a representative there, the attendees, each and every one, talks to them. It’s 100%.

The Right People

Even more important than the fact that we’re getting 100% of the people to hear whatever message we care to share is the fact that we’re sharing with the right people. The people on SQL Cruise are the ones who can convince their boss, a little against common sense, that not only will going on a cruise be good for the individual, it will be good for the company. You think I don’t want to show these people how we can help them solve problems? Yes I do. Sometimes the person on SQL Cruise is the boss. Do I want to talk to them? Yes. Every so often, the person on the cruise is junior, or just getting going. Do I still want to talk to them? Yes, because these people, along with the others provide amplification. As a sponsor, amplification is everything.

Amplification

Very few people come out of SQL Cruise that are not impassioned with what they’ve learned. They’re rested, recharged, and filled with new information. They let the world know. Further, many people come out of SQL Cruise and change their careers. I know a large number of speakers, bloggers, writers, all of whom got their start on the cruise. They share. They share a lot. One of the things they share is the messages of the sponsor on the cruise. Suddenly, instead of the 30 eyeballs that I reached on the cruise, I’m reaching a lot more. That’s because the people on the cruise become very active, very vocal, and very supportive of the sponsor that helped provide them with a life-changing experience. This provides a huge impact for us as a sponsor.

Ready To Cruise Again

I’m taking part in the next cruise in January in the Caribbean. I’ve already worked out with Tim the topics I’ll be covering. I’m prepping the material now because it does take that long to get things ready for SQL Cruise. I’m lobbying work to get on the Alaska cruise later in the year. In short, I’m ready to go on SQL Cruise again. I’d sure like the opportunity to share the experience with you. Go here to sign up.

Oct 17 2016

SELECT * Does Not Hurt Performance

SELECT *I read all the time how SELECT * hurts performance. I even see where people have said that you just have to supply a column list instead of SELECT * to get a performance improvement. Let’s test it, because I think this is bunkum.

The Test

I have here two queries:

SELECT *
FROM Warehouse.StockItemTransactions AS sit;

--and

SELECT sit.StockItemTransactionID,
       sit.StockItemID,
       sit.TransactionTypeID,
       sit.CustomerID,
       sit.InvoiceID,
       sit.SupplierID,
       sit.PurchaseOrderID,
       sit.TransactionOccurredWhen,
       sit.Quantity,
       sit.LastEditedBy,
       sit.LastEditedWhen
FROM Warehouse.StockItemTransactions AS sit;

I’m basically going to run this a few hundred times each from PowerShell. I’ll capture the executions using Extended Events and we’ll aggregate the results.

The Results

I ran the test multiple times because, funny enough, I kept seeing some disparity in the results. One test would show a clear bias for one method, another test would show the opposite. However, averaging the averages we see that things broke down as follows:

* 167.247ms
Column List 165.500ms

That’s after about 2000 separate executions of each query. There’s a 2ms bias towards the Column List query as opposed to the *. That’s an improvement, if you want to call it that, of 1%. It’s hardly worth the bother, assuming that with more testing this continued to hold true. In multiple tests, the SELECT * ran faster. I just feel honor bound to put up the full results. They show an improvement, but not one I’d get excited about. Oh, and the reads, the execution plan, everything else… identical.

SELECT * Conclusion

Don’t get me wrong, there are lots of reasons to not use SELECT *. Yes, performance is one of the reasons to not use SELECT *. However, when most people suggest that maybe using SELECT * is a bad idea for performance reasons, what they’re saying is you ought to only move the columns you need and the data you are actually using, not everything. I’m not aware of anyone with experience and knowledge suggesting that using the complete column list instead of SELECT * is faster. As we can see in the tests above, it isn’t (or is by so small a margin, who cares).


I love talking performance tuning. In fact, I’ll be doing an all day seminar on execution plans and query tuning before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.

Oct 11 2016

Pre-Summit PASS Board Update

Globally BusyBusy, busy, busy.

A lot of the work around PASS currently is getting ready for Summit. I’m prepping my part in the keynotes. I’m involved in lots of community discussions including SQLSaturday, Chapters and Regional Mentors. We’re setting up the Community Zone for all sorts of activities throughout the week. We’re also going to have one of our few in-person board meetings at Summit. Putting on the single largest gathering of data professionals around the Microsoft Data Platform is actually labor-intensive. The majority of the work is done by the amazing individuals at Christianson & Company, but the board is involved in the necessary decisions and, being the guy who handles finance, I’m in on a lot of those decisions.

Personally, I think we’re putting together, say it with me, THE BEST SUMMIT EVER. I won’t be upset if you withhold judgement until things are actually delivered, but I’m sure I’m right.

I’ve been pushing forward on getting momentum in the Global Growth committee. Tons of help from Wendy Pastrick and Tim Ford has helped me establish what we’re going to be doing. I’ve posted the general goals before, but I’ll reiterate them quickly. We want to maintain the momentum we’ve established in LATAM and reinvigorate momentum in EMEA over the next 9 months or so. In addition to talking to individuals all over the world on this topic as I travel around for work, I’ve also organized and held several meetings with the Regional Mentors for their feedback. All this is an effort to understand what problems people are seeing locally, regardless of what country they are living in. I’ve collected enough information that I think I can write up some SMART goals and deliver them to the board and you in the next week or so.

I’m committed to seeing a global focus in this organization. I know the organization is also committed to having a global focus. We already are doing so much globally. We’re just going to get better and expand the areas where we can help.

A big part of this job entails meetings. Many meetings. There is just so much to discuss. And spreadsheets. Blech. However, that’s what I signed on for when I agreed to move to the Exec.

Don’t ever make the mistake of believing that joining the board won’t involve labor. It will. However, it’s extremely rewarding labor and I know that taking part has improved me as an individual. I’m very honored to be able to help out.

Oh, and make sure you vote in the PASS election. The polls close today. Tim Ford has a very good assessment of what makes a good Director for the board, as well as another call for you to get your votes in. It’s worth a read whether you’ve already voted or not.

Oct 10 2016

Statistics Are Vital For Query Performance

This is post #10 supporting  Tim Ford’s (b|t) initiative on #iwanttohelp, #entrylevel. Read about it here.

When you send a query to your SQL Server database (and this applies to Azure SQL Database, APS, and Azure SQL Data Warehouse), that query is going to go through a process known as query optimization. The query optimization process figures out if you can use indexes to assist the query, whether or not it can seek against those indexes or has to use a scan, and a whole bunch of other stuff. The primary driving force in making these decisions are the statistics available on the indexes and on your tables.

What Are Statistics

Statistics are a mathematical construct to represent the data in your tables. Instead of scanning through the data each and every time to determine how many possible rows are going to come back for the query you provided, SQL Server uses statistics, which are gathered and calculated automatically by default, to determine the likely number of rows. I cover the details behind statistics in this article on Simple-Talk. The important thing to remember is that statistics are what drives the optimizer. Without statistics, you would only ever see table scans and performance would be horrible most of the time. Statistics show how many rows are likely to be returned by a given value. Statistics are created on indexes automatically. They are also created on columns that do not have indexes, by default, when that column is referenced in a WHERE, ON or HAVING clause of a query.

It can’t be over-emphasized that you must have statistics for the optimizer to make good choices. You can find out more about statistics in this other article on Simple-Talk.

Automatic Settings For Statistics

Statistics are so important that SQL Server will automatically create them for you. There is a setting that allows you to partially change this behavior. AUTO_CREATE_STATISTICS is enabled by default on each database. You can disable it using an ALTER DATABASE command. What will then happen is that the only statistics you’ll have in your system are those that are created for indexes. For the overwhelming vast majority of databases out there, turning off the automatic creation of statistics is a very bad idea. You’re taking away the ability of the optimizer to determine the likely rows being returned by a query. That means the optimizer is likely to make poor choices.

Statistics are also automatically maintained by SQL Server. You can read about the automatic processes in this documentation at MSDN. There are a number of modifications you can make to this automated behavior, and you can turn off automatic statistic maintenance by changing the AUTO_UPDATE_STATISTICS setting on a database. Just like the automatic creation of statistics, most systems are benefitting from the automatic maintenance of statistics. Don’t turn this setting off unless you are ready to take direct control over your statistics. Interestingly enough though, the automatic statistics maintenance may not be enough for many systems. You might need to augment statistics maintenance.

Maintaining Your Statistics

For most systems, start with leaving the AUTO_UPDATE_STATISTICS setting to on. Then, as you find you might need to manually update statistics, you can do one of two things. An easy way to update your statistics is to run sp_updatestats. This command will update the statistics across an entire database. You can somewhat control how it behaves. Read more about it here on MSDN. If you want to take direct control over how any given set of statistics are maintained, you use UPDATE STATISTICS. This gives the maximum amount of control.

In addition to writing your own code to maintain statistics, you might look to third party choices. One very common option is to use the scripts provided by Ola Hollengren. Not only do they manage and maintain statistics, but they also help with index fragmentation. A newer and much more interesting choice is to look to Minion Reindex by the Midnight DBA team. This tool offers quite a lot of control and functionality that is not present in Ola’s scripts. I wrote a review of an earlier version of the tool here on SQL Server Central.

Conclusion

This blog post just skims the surface of statistics. For lots more detail, please follow and read the links mentioned above. You must ensure that your SQL Server databases have statistics. You must also ensure that your SQL Server databases maintain their statistics. Statistics are a vital part of query performance that ignoring will hurt.

Oct 05 2016

Where Do We Go To Share?

No one reads blogs any more.

Twitter is dying.

Facebook is broken.

LinkedIn? Please.

G+. Is that even on any more?

Where do we go to share?

I’m seeing it here on this blog. Traffic is down. Not just day-to-day traffic, but the search hits. That could just be that I’m producing crap content or stuff that no one is interested in. However, Twitter isn’t growing like it once was and there are many reports that it’s shrinking. Facebook is running into problems. So… Where the heck are people going? How do we continue to share without a relatively common communication tool?

I know there’s some push for Yammer. However, lots of people hate it. Slack and slack channels get a little traction, but to a degree this is just another type of online forum and the adoption rate isn’t very high. My family insists on using WhatsApp. However I don’t see lots of us sharing that way. Snapchat is fine if you don’t actually want to talk to lots of people or retain anything. It’s not going to work as a mass medium of disseminating information. I check in regularly with my college age kids to see what apps they’re using to communicate. There’s nothing I haven’t mentioned here, so I’m unsure how the current generation is going to get connected, especially since they think that Twitter is for old people.

In all seriousness, where do we go as an online data professional community to maintain a relatively cohesive set of communications? I’m curious.

Also, if people aren’t reading blogs and they’re not downloading the Books Online, where the heck are they going to learn about SQL Server and Azure? Does anyone know?

Oct 03 2016

Correlated Datetime Columns

SQL Server is a deep and complex product. There’s always more to learn. For example, I had never heard of Correlated Datetime Columns. They were evidently introduced as a database option in SQL Server 2005 to help support data warehousing style queries (frequently using dates and times as join criteria or filter criteria). You can read up on the concept here from this older article from 2008 on MSDN. However, doing a search online I didn’t find much else explaining how this  stuff worked (one article here, that didn’t break this down in a way I could easily understand). Time for me to get my learn on.

The concept is simple, turning this on for your database means that dates which have a relationship, the example from MSDN uses OrderDate and DueDate from the Purchasing.PurchaseOrderHeader and Purchasing.PurchaseOrderDetail tables respectively. Clearly, the DueDate would be near the OrderDate in general terms. Cool. Makes sense. Let’s see how it works, if it works.

Correlated Datetime Columns At Work

I’m going to use the example code from the Microsoft article, partly because there are aspects of it I don’t understand and exploring it will help me learn. Let’s start with the (slightly modified) base query that we hope will benefit from the Correlated Datetime setting:

SELECT *
FROM Purchasing.PurchaseOrderHeader AS poh
JOIN Purchasing.PurchaseOrderDetail AS pod
   ON poh.PurchaseOrderID = pod.PurchaseOrderID
WHERE poh.OrderDate
BETWEEN '20130801' AND '20130901';

The changes I made were the aliases and the values in the WHERE clause. I left everything else the same. Performance on this was 115ms on average with 59 reads. This is the execution plan:

ExecPlanNoIndexNoSetting

Click to embiggen. The missing index suggestion is on OrderDate using an INCLUDE of all the columns (the joy of SELECT * and Missing Index Hints, I should post a rant on Missing Index Hints at some point). The plan itself uses two Clustered Index Scan operations to retrieve the data. Since the scans are Ordered (check the properties in each operator to validate this, but the lack of a Sort operator is also a good hint), a Merge Join was used to put the data from the two tables together.

The example code drops and recreates the clustered index on the PurchaseOrderDetail table because one of the tables must have a clustered index with a datetime column as the first (or only) column in the key for Correlated Datetime Columns. However, I like to see how things behave. Let’s first create a non-clustered index on PurchaseOrderDetail.

CREATE INDEX IX_PurchaseOrderDetail_DueDate
ON Purchasing.PurchaseOrderDetail(DueDate);
GO

I don’t expect this to do anything for the above query, but I want to test each step. The non-clustered index doesn’t change anything. The execution plan and performance are the same as above. Fine. Let’s enable Correlated Datetime Columns:

ALTER DATABASE AdventureWorks2014
   SET DATE_CORRELATION_OPTIMIZATION ON;
GO

Running the query again, I can’t detect any differences. The execution plans are identical and the performance is the same. In case the plan wasn’t removed from cache because of the database setting change, I sure would expect it to be, but I could be wrong, I’m going to remove it from the cache:

DECLARE @PlanHandle VARBINARY(64);

SELECT  @PlanHandle = deps.plan_handle
FROM    sys.dm_exec_procedure_stats AS deps
CROSS APPLY sys.dm_exec_sql_text(deps.sql_handle) AS dest
WHERE   dest.text LIKE 'SELECT *
FROM Purchasing.PurchaseOrderHeader AS poh%';

IF @PlanHandle IS NOT NULL
    BEGIN
        DBCC FREEPROCCACHE(@PlanHandle);
    END
GO

Nope. It’s not because of a cached plan. The Correlated Datetime Column setting just isn’t digging how I’ve got the clustered index set up. However, let’s also test this. I’ll create an index on the PurchaseOrderHeader.OrderDate as well:

CREATE INDEX IX_PurchaseOrderHeader_OrderDate
ON Purchasing.PurchaseOrderHeader (OrderDate);
GO

Nope. No joy. All right. Clean up the test indexes and follow Microsoft’s lead:

DROP INDEX IX_PurchaseOrderDetail_DueDate
ON Purchasing.PurchaseOrderDetail;
GO
DROP INDEX IX_PurchaseOrderHeader_OrderDate
ON Purchasing.PurchaseOrderHeader;
GO
CREATE UNIQUE NONCLUSTERED INDEX
IX_PurchaseOrderDetail_PurchaseOrderID_PurchaseOrderDetailID 
ON Purchasing.PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID);
GO
ALTER TABLE Purchasing.PurchaseOrderDetail
DROP CONSTRAINT PK_PurchaseOrderDetail_PurchaseOrderID_PurchaseOrderDetailID;
GO
CREATE CLUSTERED INDEX IX_PurchaseOrderDetail_DueDate
ON Purchasing.PurchaseOrderDetail(DueDate);
GO

Rerunning the original query I immediately see a difference. The reads dropped from 59 to 57. The execution time is a little better at 105ms. The execution plan is very different:

ExecPlanClusteredIndexSetting

We’re making use of that clustered index and it’s changed to a Hash Join. Let’s see the predicate for the Seek operation:

Seek Keys[1]: Start: [AdventureWorks2014].[Purchasing].[PurchaseOrderDetail].DueDate >= Scalar Operator(‘2013-07-07 00:00:00.000’), End: [AdventureWorks2014].[Purchasing].[PurchaseOrderDetail].DueDate < Scalar Operator(‘2013-10-05 00:00:00.000’)

So, the optimizer has picked a range of values that are near to the values that have been passed in, assuming a correlation between the data. According to the documentation the optimizer also ensures that using this correlation will still result in the same data being returned.

The original scan of PurchaseOrderHeader is still the same. In fact, we can see that when comparing the plans:

ExecPlanCompare

You’ll note that I have the similarities highlighted, so you can tell that the Scan is the same. Yes, it’s cost percentage within the plan has changed, but it’s still the same basic operation.

The magic comes from the optimizer creating statistics in the form of a materialized view that are created and maintained around the data in tables that qualify for the behavior of the Correlated Datetime Columns. You can even see these in your views. They have a common naming standard as outlined in the documentation: _MPStats_Sys_<constraint_object_id>_<GUID>_<FK_constraint_name> (no word on if MP is anything as crazy as the WA of system generated statistics).

In short, Correlated Datetime Columns worked… Or did it. We’re comparing apples to hammers at the moment. I’ve got a new clustered index on one of the tables. That changes all the choices, whether or not I’ve changed some database setting. Let’s remove Correlated Datetime Columns, pull the plan from cache and then rerun the query.

Oops. Performance is now at 145ms on average with 118 reads. The execution plan has changed yet again:

ExecPlanClusteredIndexNoSetting

In short, it works.

I could go farther and look to replace the clustered index on the PurchaseOrderHeader table, but you get the idea.

Conclusion

Correlated Datetime Columns works. Clearly it’s not something you’re going to enable on all your databases. Probably most of your databases don’t have clustered indexes on datetime columns let alone enough tables with correlation between the data stored in them. However, when you do have that type of data correlation, enabling Correlated Datetime Columns and ensuring you have a clustered index on the datetime column is a viable tuning mechanism. Further, this is a mechanism that has been around since 2005. Just so you know, I did all my testing in SQL Server 2016, so this something that anyone in the right situation can take advantage of. Just remember that TANSTAAFL always applies. Maintaining the statistics needed for the Correlated Datetime Columns is done through materialized views that are automatically created through the optimization process. You can see the views in SSMS and any queries against the objects. You’ll need to take this into account during your statistics maintenance. However, if Correlated Datetime Columns is something you need, this is really going to help with this, fairly narrow, aspect of query tuning.

Sep 30 2016

Azure Data Platform Resources

A few months ago I created a GitHub repository for the purpose of building and maintaining a list of resources for Azure Data Platform training. You can see it here. My goal in putting this into GitHub instead of just running it on my blog is to make it a community resource. I want all of you to maintain it. If you’re teaching a class (one hour or one week, I don’t care), I’d like you to add yourself to the instructors list. If you have a blog where you post Azure content, please, add your blog. Are you a PowerBI monster? Get on the list. Please, help me create and grow this list so that people have a central, public, resource for this information. More and more of you are moving into Azure all the time. You need to know where to go to gain the knowledge necessary to be successful on the Azure Data Platform. That’s what the Azure Data Platform Learning Resources repository is all about.

Click the link. Edit a page. Do it!

Sep 29 2016

Volunteering With PASS

I was just asked how many times I’ve been to the PASS Summit. This year will be my 12th consecutive one. That made me start thinking. At my very first Summit, I met a couple of volunteers for PASS (Allen Kinsel was one of them, I’ll blame him forever). They were having so much fun that I decided to volunteer. I’ve been volunteering now for eleven years. I couldn’t stop. Here’s what I’ve done over the years at PASS:

Book Review Committee (my first PASS “job”)
Editorial Committee (I wrote stuff)
Editor of the SQL Standard (I had other people write stuff)
First-timer Mentor (I wasn’t very good at this one)
First-timer Event… Chair? (I got up in front of the room, this one might not count)
Founder of SQL Kilt Day (and we all know, this one is the most important)
Ambassador (they help tell you where your room is at Summit, did this several times, it’s fun)
Chapter Leader (hell, Chapter Founder along with some good friends)
SQLSaturday organizer (woof)
Program Committee (single hardest job I’ve done with PASS)
Director (in charge of the Chapters portfolio)
Executive Vice President (currently serving and loving it, even though it’s not easy)

6u1g6SQGu5sjaPjLRnsq02s070Holy crap. That’s been a busy eleven years. I may have missed one or two positions along the way.

Why am I posting this? Because my engagement, one time, at the PASS Summit, led me to do all these things with this wonderful organization. Quite literally, PASS changed my life for the better. I’ll bet it can for you too. If you’re not attending Summit this year, I encourage you to reconsider. It’s not too late to register. Even if you don’t go to the event, please consider volunteering. If you go to Summit, be sure you engage with as many people as you can. You never know what will happen.

Sep 27 2016

Networking and the PASS Summit

PASS Summit 2016 is fast approaching. If you’re going, time to start making plans. If you’re not going, sell your boss on the idea and get registered. It’s only the largest Microsoft Data Platform event on the planet. There are over 200 sessions given by some of the most knowledgeable people you’ll ever get the chance to learn from. The schedule is posted, just look it over. However, I want to drill down on another aspect of the event that it’s way too easy to miss out on, networking.

Lots of people miss this aspect of events like the PASS Summit. I know I used to. I went to all kinds of IT events over the years, but all I ever did was attend the sessions. I didn’t spend any time at all attempting to meet people. Frankly, if you’re an introvert, as I am, that can be hard (and yes, I really am, I’ve been tested). Even if you work up the nerve to walk up to your favorite author/blogger/speaker and say “Man, I love your stuff” or “I have a question,” you still haven’t actually made contact, you haven’t started to network. In order to start networking, you have to start making a connection to people, and that means engaging.

Why build your network?

That’s easy. Because you’re only as good as your network. I don’t care how smart you are as an individual, you can’t know everything. You’re going to have gaps in your knowledge. Your network is there, in part, to help fill those gaps. You want to make a direct connection to people so that they remember who you are, what you do and what you know. After talking for a while, you’ll get a sense of what people can do. If you get stuck on a networking issue, you might send your new friend an email because they were talking about all the networking stuff they do. Your network expands your skill set. Your network expands your knowledge base. Your network expands your worth.

The amazing thing about the PASS Summit is the unique opportunity it presents for networking. Umpty-thousand of your peers all in one place, geared up & ready to connect, share & learn (heard that somewhere, it sounded good). Add to that the horde of Microsoft engineers that are going to be there (and yeah, you want to network with the Microsoft people too). You won’t get as unique an opportunity any where else.

Your Networking Assignment

You have and assignment. We. We have an assignment.

We’re not going back to our hotel at the end of the last session. If we’re a first-timer, attend the events that are set aside just for us. We’re going to go to the social events. We’re going to chat with people. Pull right up to a table where people are shoveling food into their face and say “Hey! My name is <insert your name here>. I’m a <insert your job description here>. What do you do?” Substitute your name and job description in the appropriate places. Next, ask this person if there is a session or a speaker they’re excited about. Finally, ask them if they’re going to the keynote. The reason we’re asking all these questions is because people actually love to talk about themselves. It’s a great ice breaker. If you find that you hit it off, arrange to meet at a session or lunch. If not, no big deal.

We’re going to do this at least three times over the week. I promise you, when we leave Summit at the end of the week, we’ll have at least one, real, contact. That’s how we get our network going, direct engagement.

It’s not too late to register. Get it done. Just plan on spending some time talking to people. If no one else, please, track me down and say hello.