Yes, Foreign Keys Help Performance

09Sep 2015 by Grant Fritchey 16 Comments

I created this example several years ago that illustrates how foreign key constraints can help performance. It’s a contrived example. Granted.Â I feel like it illustrates the point.

However, over the years, people have questioned one aspect of it. The optimizer uses the foreign keys to figure out which tables can be eliminated from the query, making for a more efficient plan and making the query run faster. The pushback has always been, “Yeah, Grant, but nobody writes T-SQL where they include extra tables that they don’t need.”

My initial response, after I stop laughing, is to point out any number of ORM tools. But, you know what, let’s assume that’s correct. No one would everÂ create a giant catch-all view that has all their JOINs in one place so theyÂ don’t have to write them for different queries.Â Stuff like that never happens. Everyone will only, ever, write a query based on the exact tables they need, so this simplification process of the optimizer is aÂ bad example. I agree.

Please go read through the other post for all the details of the set up. I’m just going to focus on one query in this post.

First, in the example, the SELECT only references the Person table and there is no WHERE clause. So we could completely eliminate all the other tables. But, I still want to illustrate how foreign keys help, so let’s assume, for some reason, we need the JOIN (yes, contrived, but it’s going to illustrate the point, again):

SELECT  p.LastName + ',' + p.FirstName AS PersonName
FROM person.BusinessEntityAddress AS bea
JOIN Person.Person AS p
ON p.BusinessEntityID = bea.BusinessEntityID;
GO
SELECT  p.LastName + ',' + p.FirstName AS PersonName
FROM dbo.MyBusinessEntityAddress AS bea
JOIN dbo.MyPerson AS p
ON p.BusinessEntityID = bea.BusinessEntityID;
GO

The first query goes against a pair of tables with enforced referential constraints. The other query goes against an identical set of tables (indexes are all the same, so is the data and the statistics), but no foreign keys (the final point at the old blog post). If we capture performance metrics on these two queries we see the following:

	Execution Time (Avg ms)	Reads
Foreign Key	134095.6667	155
No Foreign Key	193937.6667	3929

On average, the second query is about 60ms slower and has 18 times as many reads.Â Why? Let’s look at the execution plans:

The only difference between these tables is the existence of foreign key constraints, which leads to differences in the estimated number of rows. This occurs because the optimizer knows that the INNER JOIN is going to limit the data coming back to only matching data AND it knows that because there’s an enforced referential constraint, exactly how many rows that will be. Because of this knowledge, it chooses an execution plan that is more efficient.

Foreign keys help performance in SQL Server.

16 thoughts on “Yes, Foreign Keys Help Performance”

Dave Wentzel

I believe the caveat is that the FK must be trusted. I did a script to prove this years ago so it might have changed in recent versions and I don’t work with SQL Server anymore to repro it. Another great reason to use DRI.

September 9, 2015 at 9:48 pm
Reply
Grant Fritchey

Oh yeah, absolutely. If it’s been created using WITH NOCHECK all bets are off.

September 9, 2015 at 10:22 pm
Reply
Michael Rybicki

Doesn’t specifying and INNER JOIN on the second query essentially do the same thing? And if so, and I do that in all my queries (INNER or LEFT), is there any performance benefit to foreign keys?

September 14, 2015 at 7:10 am
Reply
Grant Fritchey

Nope. The JOIN outlined above is an INNER JOIN. I just didn’t add the word. It’s not necessary. It’s about row limits. Adding the INNER keyword wouldn’t change this at all.

September 14, 2015 at 8:33 am
Reply
(SFTW) SQL Server Links 18/09/15 - John Sansom

[…] Yes, Foreign Keys Help PerformanceÂ –Â Grant Fritchey (Blog|Twitter) […]

September 18, 2015 at 5:55 am
Reply
BPOTW 2015-09-18 | SQL Notes From The Underground

[…] PowerShell Variables for Admins Who Fear Change Automated Metadata Report for SSAS Tabular Yes, Foreign Keys Help Performance When to Quote in PowerShell How to Confuse the SQL Server Query Optimizer HammerDB Performance […]

September 21, 2015 at 4:55 am
Reply
Andrew Dale

the second query is about 60s slower not 60ms

September 22, 2015 at 4:54 am
Reply
Grant Fritchey

Those numbers are microseconds from Extended Events.

September 22, 2015 at 9:46 am
Reply
Greg Hayes

Should Foreign Keys be indexed as well to improve performance?

September 24, 2015 at 8:55 pm
Reply
Grant Fritchey

Good question. Unfortunately, the answer is, it depends. I’ll say they are certainly a great candidate for indexing.

September 24, 2015 at 10:37 pm
Reply
annlewkowicz

What is the impact if the foreign keys have constraints disabled? Do they still make a (positive) difference to query performance?

February 25, 2019 at 12:55 pm
Reply
- Grant Fritchey
  
  Great question. The answer is no. If they are disabled, then they are not trusted and the optimizer can’t use them. Same thing if they’ve been created (or updated) with the NOCHECK option. No trust = No performance enhancements.
  
  February 26, 2019 at 3:42 am
  Reply
Greg

Based on the sentence “This occurs because the optimizer knows that the INNER JOIN is going to limit the data coming back to only matching data AND it knows that because thereâ€™s an enforced referential constraint, exactly how many rows that will be” can this technique also be leveraged if table A has a FK to table B and table B has a FK (1-to-1) to table C but table B isn’t needed in the query output? For example: table A is Lecture, table B is Professor and table C is Person. In this example, a Professor is a Person and contains columns unique to professors. Also, by having a FK to this table, a column can only contain the ID of a professor rather than “any person.” If a query needs columns from tables A and C (but not B), could it be defined with the JOIN from A to C (skipping B) and benefit from the optimizer “knowing” the structure? Or, should there be 2 JOINs (A to B, then B to C) even though table B isn’t really needed, in order for the benefit to be realized?

April 15, 2021 at 1:56 am
Reply
- Grant Fritchey
  
  I don’t know.
  
  I would have to test it to be sure. However, I believe (and I’m pretty sure I’m right), that leaving tables out of the query means that anything on, or about, those tables, will be ignored by the optimizer. It doesn’t read through all the dependencies to tables not included in the query to attempt to add them, or reference constraints they may have. So, I’d say, no, it doesn’t. However, testing would be the one way to know for certain.
  
  The opposite, elimination of tables not needed to satisfy the query, that is part of the optimization process. And having constraints in place can help with that.
  
  However, as to your question, I just don’t know without setting up the tests.
  
  April 15, 2021 at 9:07 am
  Reply
Greg

Thank you for the quick response, especially to an older post. If I can set up a useful test, I’ll share the results.

April 15, 2021 at 11:06 am
Reply
- Grant Fritchey
  
  Please do. I’d love to know for certain. I believe my guess to be right, but I’d rather validate it than suggest something that is blatantly wrong.
  
  April 15, 2021 at 11:08 am
  Reply