Category: SQL Server 2008

Sep 09 2015

Yes, Foreign Keys Help Performance

I created this example several years ago that illustrates how foreign key constraints can help performance. It’s a contrived example. Granted. I feel like it illustrates the point.

However, over the years, people have questioned one aspect of it. The optimizer uses the foreign keys to figure out which tables can be eliminated from the query, making for a more efficient plan and making the query run faster. The pushback has always been, “Yeah, Grant, but nobody writes T-SQL where they include extra tables that they don’t need.”

My initial response, after I stop laughing, is to point out any number of ORM tools. But, you know what, let’s assume that’s correct. No one would ever create a giant catch-all view that has all their JOINs in one place so they don’t have to write them for different queries. Stuff like that never happens. Everyone will only, ever, write a query based on the exact tables they need, so this simplification process of the optimizer is a bad example. I agree.

Please go read through the other post for all the details of the set up. I’m just going to focus on one query in this post.

First, in the example, the SELECT only references the Person table and there is no WHERE clause. So we could completely eliminate all the other tables. But, I still want to illustrate how foreign keys help, so let’s assume, for some reason, we need the JOIN (yes, contrived, but it’s going to illustrate the point, again):

SELECT  p.LastName + ',' + p.FirstName AS PersonName
FROM person.BusinessEntityAddress AS bea
JOIN Person.Person AS p
ON p.BusinessEntityID = bea.BusinessEntityID;
SELECT  p.LastName + ',' + p.FirstName AS PersonName
FROM dbo.MyBusinessEntityAddress AS bea
JOIN dbo.MyPerson AS p
ON p.BusinessEntityID = bea.BusinessEntityID;

The first query goes against a pair of tables with enforced referential constraints. The other query goes against an identical set of tables (indexes are all the same, so is the data and the statistics), but no foreign keys (the final point at the old blog post). If we capture performance metrics on these two queries we see the following:

Execution Time (Avg ms) Reads
Foreign Key 134095.6667 155
No Foreign Key 193937.6667 3929

On average, the second query is about 60ms slower and has 18 times as many reads. Why? Let’s look at the execution plans:


The only difference between these tables is the existence of foreign key constraints, which leads to differences in the estimated number of rows. This occurs because the optimizer knows that the INNER JOIN is going to limit the data coming back to only matching data AND it knows that because there’s an enforced referential constraint, exactly how many rows that will be. Because of this knowledge, it chooses an execution plan that is more efficient.

Foreign keys help performance in SQL Server.

Aug 24 2015

Targeted Plan Cache Removal


A lot of times you’ll hear how people are experiencing sudden, intermittent, poor performance on a query, bad parameter sniffing at work, so they’ll fix it by running the following code:



Yeah, you just nuked the cache on your server because you wanted to take out a single terrorist query. Now, yes, that problematic query is going to recompile and hopefully have a better execution plan. Also, so are all the other queries on your system. That spike in CPU and the slow-down all your business people are experiencing… Your fault for going nuclear.

Instead of a nuke, why not use a sniper rifle to just remove the one problematic plan. Here’s a little piece of code to help out:


SELECT  @PlanHandle = deps.plan_handle
FROM    sys.dm_exec_procedure_stats AS deps
WHERE   deps.object_id = OBJECT_ID('dbo.SomeProcedureName');

IF @PlanHandle IS NOT NULL
        DBCC FREEPROCCACHE(@PlanHandle);

Take a look at the documentation for FREEPROCCACHE. You can target specific plans using a plan_handle, a sql_handle, or even a resource governor pool by passing pool_name. I take advantage of that with this query to pull the plan_handle from sys.dm_exec_procedure_stats. You could use T-SQL text too, you’d just have to add in sys.dm_exec_query_text to one of the other DMOs that has the plan_handle or sql_handle such as sys.dm_exec_requests or sys.dm_exec_query_stats. However you choose to do it, you can use a targeted approach to remove plans from cache.

Let’s take the nukes off the table.

Want to really get into talking about the plan cache, query tuning, parameter sniffing and the rest? I have two upcoming all day pre-conference seminars. There’s still room at IT/Dev Connections in Las Vegas on September 14th. Click here to register. Also, at Connections, I’m hoping to be able to read your execution plans, so bring the really scary ones. Later that same week, I’ll be at SQL Saturday San Diego and will do a pre-con there on query tuning. Click here to get your seat.


Aug 10 2015

Generating Estimated Plan and the Plan Cache

Does generating an Estimated Plan cause that plan to be loaded into the plan cache?



What? Still here? You want more? Proof? Fine. Let’s first run this bit of code (but please, not on your production server):


That will remove all plans from cache. Now, let’s take this query and generate an Estimated Plan (CTL-L from your keyboard or by clicking on the “Display Estimated Execution Plan” button on the toolbar):

SELECT * FROM Production.ProductModel AS pm;

This will generate a trivial plan showing a scan against the Production.ProductModel table. Now, let’s run another query:

SELECT  deqs.plan_handle
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE   dest.text = 'SELECT * FROM Production.ProductModel AS pm;';

That’s just an easy way to see if a plan_handle exists. If a plan was stored in cache for this query, I should see a result. I don’t. Now, you might say that this is because it’s a trivial plan. So, let’s complicate the query a little, add some JOINs and a WHERE clause:

FROM    Production.Product AS p
JOIN    Production.ProductModel AS pm
        ON pm.ProductModelID = p.ProductModelID
JOIN    Production.ProductInventory AS pi
        ON pi.ProductID = p.ProductID
JOIN    Production.Location AS l
        ON l.LocationID = pi.LocationID
WHERE   p.ProductID = 750;

Not the most complicated query around, but, it will go through full optimization because it is not a query that will generate a trivial plan. Let’s generate an Estimated Plan again. Now we’ll modify our query against the cache just a little:

SELECT  deqs.plan_handle
FROM    sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
WHERE   dest.text = 'SELECT  *
FROM    Production.Product AS p
JOIN    Production.ProductModel AS pm
        ON pm.ProductModelID = p.ProductModelID
JOIN    Production.ProductInventory AS pi
        ON pi.ProductID = p.ProductID
JOIN    Production.Location AS l
        ON l.LocationID = pi.LocationID
WHERE   p.ProductID = 750;'

You won’t see any rows returned. Now, if I execute the query and then re-run the query against the cache, then I see a plan. How about stored procedures? I have a really simple one I use to teach parameter sniffing:

EXEC dbo.spAddressByCity
    @City = N'Mentor';

I’ll generate an Estimated Plan and then query the cache using a slightly different approach:

SELECT  deps.plan_handle
FROM    sys.dm_exec_procedure_stats AS deps
WHERE   deps.object_id = OBJECT_ID('dbo.spAddressByCity');

Again, assuming I’ve cleared the cache and I haven’t executed the procedure, this returns zero rows. That’s because generating an Estimated Plan does not then load that plan into the cache. The plan gets loaded into cache when it gets executed.

I love talking about execution plans and query tuning. If you do too, we two opportunities to get together. The first is in Las Vegas at the Connections conference. Click here to register. If you bring your execution plans to Vegas, I’ll try to make some time during the session to use your plans to demonstrate to the crowd. The second opportunity will be down in San Diego, the day before their SQL Saturday event.

Aug 06 2015

Reading Your Execution Plans

I’m putting the finishing touches on the all day seminar on execution plans that I’ll be putting on at DevConnections this year. I have tons of examples to show you how to interpret execution plans as part of query performance tuning. We’ll be able to fill our time. However, I’ve got a pretty high degree of flexibility on which examples I use. With that in mind, I’d love it if you could bring in your own execution plans that you’d like some help reading. I won’t guarantee a solution for your query problems, but I promise to read through it and make some suggestions.

If you have an execution plan that you don’t mind sharing with the group and you’d like a little consulting time on tuning the query, bring it with you to Las Vegas.

Aug 03 2015

Differences Between Actual & Estimated Plans

I have, in the past, made way too much of the need for Actual Plans when doing performance troubleshooting. The primary reason for this is to get the Actual Plan in order to see the differences between the Actual and Estimated Row Counts as a means of understanding how the optimizer saw the data. But, is that the only thing that’s different between Actual & Estimated Plans? Well, pretty much, yeah.

I took two fairly average execution plans from SQL Server 2014 and ran them through Altova’s XML Spy, which does XML comparisons similar to how Redgate SQL Compare will compare two data structures for you. Here is every single difference I found. Everything was additional information in the Actual Plan.

In the information for the first operator, in my case, a SELECT operator, in the QueryPlan element, two properties:


In the MemoryGrantInfo element, six properties:


Then, in all the operators, in each RelOp element in the XML of the execution plan, one additional XML element, RuntimeInformation with these properties:


This data is not exhaustive. I used a relatively benign query (it hit about 10 tables, had some suggested missing indexes, and about 40 operators all told). I’m sure if I tried lots of different types of queries, I might find a few additional differences. But for your average, “Hey, Let’s Tune This”, query, we’re looking at one additional element in the operators and twelve (12) additional properties, only four of which are in the majority of the operators.

Don’t get me wrong. If I have the choice, I want to see the Actual Plan, because that Actual/Estimated row count and execution count comparison are important. However, they are not the be all, end all, of reading execution plans and query tuning. This means, if you can get the Estimated Plan, it’s probably good enough most of the time. Heck, you can look at the estimates and then look at the data and statistics directly to get a sense if they’re accurate or not. The only time you’re likely to see any differences (other than those noted above) between an Actual and Estimated Plan is in the event of a recompile. It’s also worth noting, the plan you’re going to retrieve from the cache is going to be an Estimated Plan.

Do not be afraid of the Estimated Plan.


If you want to spend a lot of quality time with me, talking execution plans and query tuning, I’ve got a couple of different all day pre-conference seminars coming up. The first is in Las Vegas at the Connections conference. Click here to register. The second will be down in San Diego, the day before their SQL Saturday event.

Jul 29 2015

Hey Kids! Let’s Put on a Show at the Old Barn

Alternate Title: I’m traveling a bunch. Let’s get together and talk.

A bunch of trips and presentations coming up, so I thought I’d share. First, I’ll be SQL Saturday Omaha for my first time ever presenting in Nebraska. I’m excited to add this state to my list (which is almost over 40 now). If you’re not doing anything August 15th, let’s have a chat. Next, fingers crossed, I’ll get selected to fly back to my home state, Oklahoma, to go to SQL Saturday OKC. These guys put on a great event and hey, it’s Oklahoma so how can it be bad. I hope they announce soon. I need to schedule my flights. This one is on August 29th.

September also has several events. First, I’ll be at SQL Saturday Las Vegas on the 12th of September. My first time at this event so I’m looking forward to it. That’s followed immediately by a pre-con and a couple of sessions at ITDev Connections. I love inter-disciplinary conferences because it gives you a chance to branch out and learn more knowledge across the stack. It’s extremely useful if you’re getting into DevOps (which you should be). My pre-con is on query tuning and execution plans, using one to help the other. This is a great event with a lot of excellent speakers. Go here to register. But I’m not done. Down in San Diego, at their SQL Saturday on the 19th, I’ll also be doing a pre-con on query performance tuning on the 18th. I’ll see if I can’t squeeze some 2016 stuff in there too. Click here now to register for the precon.

But we’re not done. In October, I’m going to hop the pond for one my absolute favorite events, Red Gate’s very own SQL in the City: London. It’s a great event with a great collection of speakers. I won’t lie, we’re going to focus on Database Lifecycle Management (DLM), but there’ll be sessions on query tuning and all sorts of other stuff as well. It’s focused on the Red Gate tools and, let’s face it, it’s a Red Gate style event, so it’s fun and entertaining. If you really do want to drill down on DLM, there’s also the all day seminars that Red Gate is hosting that week. We’ve put a lot of work into these and I think they’re some pretty amazing hands on classes that will get you started automating your own database deployments. Also, while I’m over there, I think I may find my way to a SQL Relay event, so eyes peeled.

Later in October, well, it’s the PASS Summit. ‘Nough said. I’m doing a session focused on the beginner that’s all around statistics. They’re so important to everything you do with queries, that it’s a good idea to spend some time understanding them before you get into all the craziness of query tuning. If you’re attending Summit, please stop by my session. Click here to register for the PASS Summit. Fingers cross, I might be at SQL Saturday Portland if I get accepted.

One more that I don’t mean to leave off, but I think it deserves it’s own paragraph. Right before the PASS Summit, we’re bringing SQL in the City back to Seattle. That’s right, another shot at seeing the great stuff we’re going to bringing to London, plus a few different things (we mix up the speakers and sessions to help keep it all fresh).

Jul 20 2015

SQL Server Management Studio – Footloose and Fancy Free

That’s right. There’s been a divorce. SQL Server Management Studio (SSMS) has been divorced from the server product. In fact, Microsoft is inviting you to the new SSMS coming out party.

I’m pretty excited about this. While I’m very comfortable in SSMS, to a large degree, it’s like that old pair of jeans that you’ve worn for the last 10 years. They’re comfortable too. Well, maybe a little tight when you pull them on out of the wash. One of the knees is gone. The legs are frayed so much it almost looks intentional. You just noticed a hole in the bottom. The zipper is acting up… Yeah, OK. These jeans have had it. So has SSMS.

The plan from Microsoft is to upgrade SSMS independently from the boxed product. In fact, since one of the goals is to coordinate functionality within SSMS with releases of Azure SQL Database, I think it’s pretty safe to say that we’ll be seeing very regular and frequent updates. While this is a good thing for SSMS and it’s a good thing for Azure SQL Database. This level of frequent updates might not be a good thing for individual data pros if you’re not used to actively pursuing new knowledge and new tooling all the time. Or, it can act as an incentive to help keep your knowledge levels up to date because you’re more employable if you’re maintaining your skill set.

Let this act as a prod to go out and learn some new functionality. Oh, and buy a new pair of jeans.


Jul 06 2015

But I Don’t Have a Test Server…

I frequently see statements on forums along the lines of “I don’t have a test server, so I’m going to do something I’ve never done before directly on my production servers…” This is usually followed by questions along the lines of:

  • But that’s OK and I won’t get fired, right?
  • So how do I know if it worked?
  • Is that dangerous?

The replies are:

  • No. You should be fired
  • You won’t
  • By All the Gods! Yes! It’s stupidly, insanely, I don’t really like my production server, my data, my boss, or my job, dangerous. Yes.

I understand. You’re working for a not-for-profit, so you can’t afford tons of new servers. You’re looking at a 50tb production server of which, you can’t make a copy. Your bosses just don’t want to spend the money on servers for testing. So these are valid reasons to just skip any semblance of testing or assurance that what you’re doing is good, bad or indifferent for your servers, right? No. No, no, no. 10,000 times, no.

If you’re doing something for the very first time in production, you’re going about your job wrong.

You need to set up a test machine of some type, even if it’s just a tiny VM on your laptop. At a bare minimum, this test machine should have a copy of the structure of your production database. This means you can at least validate any script that’s going to modify structures will compile and run without error. After that, you should strive to have a representation of production data. This doesn’t have to be an exact copy. In fact, it shouldn’t be an exact copy. You should clean any sensitive data prior to using it for testing. Barring that, you should at have, again, this is part of the bare minimum, a copy of your production statistics applied to this test database.

To get a copy of statistics from within SQL Server Management Studio (SSMS), right click on the database in question and select Tasks. Within the Tasks drill down menu select Generate Scripts. You’ll get a simple wizard. Script the entire database. On the Set Scripting Options screen, you’re going to choose Advanced. There, you’ll scroll down until you see Script Statistics. By default, this will be disabled. You’re going to select Script Statistics and Histogram. From there, generate your database script. You can now run this script to create a copy of your production database.

After you create your brand new test database, I’d suggest turning off Auto Update Statistics. If you get a statistics update event to fire through any of the automatic means, it’s going to see that you have no data and will replace your production statistics with empty ones. Done.

You now have the bare minimum of a test database and you no longer have any excuses for not validating your scripts prior to running them on production.

Want to talk in person about query tuning? I’m doing two different all day pre-conference seminars. The first is in Las Vegas at Connections on September 14th. Click here now to register. The second is at SQL Saturday San Diego on September 18th. Let’s get together and talk about query tuning and execution plans.

May 26 2015

Database Engine Tuning Advisor

I would love to see the Database Engine Tuning Advisor (DTA) pulled from the product. Completely. Heck, I feel bad that I included a chapter on it in my query tuning book (all updated for SQL Server 2014 by the way). Let me tell you why we need to pull this tool.

First, I understand its purpose. It’s supposed to be a fast and easy way to get some performance tuning done for people who just don’t have the time or knowledge to go through the full process of gathering metrics, evaluating poor performers, understanding root causes and applying indexes to fix those causes. I also readily acknowledge that it actually is an amazing piece of software. If you don’t agree with that, go read this white paper. With those acknowledgements in place, I still think there are two distinct problems with the approach and a serious fundamental issue with the premise. Let’s talk about the problems of the approach first.

Many, probably even most, of your query performance problems are in your T-SQL. Some of your query performance problems are in your table structure. Some of your query performance problems are in your indexes and statistics, but not a majority. The DTA can suggest indexes, statistics, materialized views, and partitioning (partitioning for performance I might add, not data management, and that opens up a completely different can of worms we can’t address here, I don’t have time). In short, it can’t address your number one problem, your code. Since it can’t address the number one problem, all it can do is work around the edges. You’ll run this thing, think you’ve addressed your issues, but your issues are still there and now you’re just as stuck as before you found the DTA and ran it. It just doesn’t fix the core issue, your code.

The second problem I see with it is that it doesn’t have enough intelligence built into it’s functionality. I say that with some reluctance because it really is a pretty amazing piece of functional code. But it suffers from a couple of shortcomings there. It’s completely dependent on the load provided to it. If that load is off, it’s recommendations are off because it just can’t have the intelligence to recognize that a poorly representative load has been provided. This lack of intelligence is supposed to be offset by the person running the DTA to ensure that they are gathering the right information and that they can interpret and test the resultant recommendations. Which brings us to the fundamental issue with the premise.

This is supposed to be run by people with no internals knowledge. Right? BUT! These people are also supposed to make a judgement based on the recommendations whether or not they should be accepted. Further, they should test all the recommendations prior to applying them to their production server. They also must gather a well structured and meaningful representative load from their production system in order to supply the DTA with good information. Further, the very sophisticated set of tests around the DTA actually makes a fundamental assumption that could be radically off, that the person designing the database has done a good and thorough job of correctly normalizing the structures. You agree with all these assumptions on the part of the DTA? Am I alone in thinking that we have a problem here? If people lack any understanding of the internals they won’t be able to judge those recommendations. If people don’t have the ability to gather and interpret performance metrics they won’t be able to test the suggestions of the DTA or provide it with the necessary test load. In short, the DTA can’t be relied on to solve the problem it’s supposed to solve because of the root cause of that problem, peoples lack of knowledge and understanding.

I won’t even get into finding indexes with names like this:


Instead of evaluating the suggestions made by the DTA and applying just those that make sense and will have a positive impact, people assume that every single suggestion from the tool is Gospel. They apply them all, without thinking, without knowledge, without appreciation of the possibility, sometimes even the likelihood, of serious negative impact.

I recognize that many people are stuck. They have a SQL Server instance that’s causing them pain and they don’t have the knowledge necessary to fix it. Further, I know a few of you have used this tool successfully in some situations. I agree that there ought to be some way to mechanically and automatically tune the server. However, the DTA is not that tool, despite it’s clear and obvious sophistication. Let’s get rid of it.

Want to learn how to tune queries? I’m putting on an all day seminar at Connections in September. Click here right now to register. We won’t use the DTA.

May 05 2015

Effects of Persisted Columns on Performance

I live for questions. And my favorite questions are the ones where I’m not completely sure of the answer. Those are the questions that make me stop presenting in order to take a note so I can try to answer the question later, usually in a blog post. Guess where we are today?

I was asked at SQL Bits in London about the direct impact of the PERSISTED operator on calculated columns, both inserts and selects. I didn’t have a specific answer, so I wrote it down for later (and asked the, self-described, persisting Dane, to email me to remind me. He did, so I put together a few tests to try to answer his question.

First, I created three tables:

CREATE TABLE dbo.PersistTest (
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED

CREATE TABLE dbo.PersistTest2 (
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2)

CREATE TABLE dbo.PersistTest3 (
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED

The first table has two calculated columns, one PERSISTED and one not. The second table has a non-persisted column only. The third table has a persisted column only. This way, I can see the behavior across all these tables and compare them to see where PERSISTED adds to, or removes from, performance.

I took a very simple set of queries and ran these a few times on each table to get an average execution time:

INSERT dbo.PersistTest (
RandomValue2) VALUES (
42.2 , -- RandomValue1 - decimal
44.4  -- RandomValue2 - decimal

The results were:

Persist Persist 2 Persist 3
3580.5 1626 2260.5

On average, across about five executions, you can see a distinct difference of about 600 microseconds between Persist 2 and Persist 3, and an even larger average on the single Persist table. Let’s try the same thing with a few SELECT statements:

SELECT  pt.CalcValue,
FROM    dbo.PersistTest AS pt
WHERE   pt.PersistTestID = 3;
SELECT  pt.CalcValue
FROM    dbo.PersistTest2 AS pt
WHERE   pt.PersistTestID = 3;
SELECT  pt.PersistValue
FROM    dbo.PersistTest3 AS pt
WHERE   pt.PersistTestID = 3;

While the results are spectular:

Persist Persist 2 Persist 3
115.5 123.5 109.5

You can see that making the calculation on the fly is more costly with both Persist and Persist2 being higher than Persist3, which had no calculations to retrieve data.

Now, let’s run the same thing, but, how about, 10,000 times so we can really see what the differences are over time. First, the inserts:

Persist Persist 2 Persist 3
84854 68783 73817

Still measured in microseconds, you can see that the cumulative totals are higher for the persisted columns than they are for the non-persisted column. So, there is an overhead for storing the extra information. But, what about queries?

Persist Persist 2 Persist 3
Average 221.25 222.16 100.54
Cumulative 2203431 2254238 1005469

I’d say that’s pretty conclusive. When I hit the SELECT statement 10,000 times instead just five or six, we clearly have differences in behavior. Interestingly enough, the logical reads were identical at 20,000 each. So it really did come down to the time spent calculating the values versus the time spent just retrieving them.

Caveats. This is a very simple test with simple calculations and small persisted values. Even doing something 10,000 times isn’t the same as doing it one million or one hundred million. In short, your mileage may vary, some assembly required, for queries lasting longer than four hours call your physician, it depends.

You can see that a PERSISTED column is going to negatively impact INSERT, but, it’s going to positively impact SELECT. The question is, which one is giving you pain at the moment.

UPDATE: Transposed the data in the final table between Persist 2 & Persist 3. Fixed it.