I created this example several years ago that illustrates how foreign key constraints can help performance. It’s a contrived example. Granted. I feel like it illustrates the point.
However, over the years, people have questioned one aspect of it. The optimizer uses the foreign keys to figure out which tables can be eliminated from the query, making for a more efficient plan and making the query run faster. The pushback has always been, “Yeah, Grant, but nobody writes T-SQL where they include extra tables that they don’t need.”
My initial response, after I stop laughing, is to point out any number of ORM tools. But, you know what, let’s assume that’s correct. No one would ever create a giant catch-all view that has all their JOINs in one place so they don’t have to write them for different queries. Stuff like that never happens. Everyone will only, ever, write a query based on the exact tables they need, so this simplification process of the optimizer is a bad example. I agree.
Please go read through the other post for all the details of the set up. I’m just going to focus on one query in this post.
First, in the example, the SELECT only references the Person table and there is no WHERE clause. So we could completely eliminate all the other tables. But, I still want to illustrate how foreign keys help, so let’s assume, for some reason, we need the JOIN (yes, contrived, but it’s going to illustrate the point, again):
SELECT p.LastName + ',' + p.FirstName AS PersonName FROM person.BusinessEntityAddress AS bea JOIN Person.Person AS p ON p.BusinessEntityID = bea.BusinessEntityID; GO SELECT p.LastName + ',' + p.FirstName AS PersonName FROM dbo.MyBusinessEntityAddress AS bea JOIN dbo.MyPerson AS p ON p.BusinessEntityID = bea.BusinessEntityID; GO
The first query goes against a pair of tables with enforced referential constraints. The other query goes against an identical set of tables (indexes are all the same, so is the data and the statistics), but no foreign keys (the final point at the old blog post). If we capture performance metrics on these two queries we see the following:
Execution Time (Avg ms) | Reads | |
Foreign Key | 134095.6667 | 155 |
No Foreign Key | 193937.6667 | 3929 |
On average, the second query is about 60ms slower and has 18 times as many reads. Why? Let’s look at the execution plans:
The only difference between these tables is the existence of foreign key constraints, which leads to differences in the estimated number of rows. This occurs because the optimizer knows that the INNER JOIN is going to limit the data coming back to only matching data AND it knows that because there’s an enforced referential constraint, exactly how many rows that will be. Because of this knowledge, it chooses an execution plan that is more efficient.
Foreign keys help performance in SQL Server.
I believe the caveat is that the FK must be trusted. I did a script to prove this years ago so it might have changed in recent versions and I don’t work with SQL Server anymore to repro it. Another great reason to use DRI.
Oh yeah, absolutely. If it’s been created using WITH NOCHECK all bets are off.
Doesn’t specifying and INNER JOIN on the second query essentially do the same thing? And if so, and I do that in all my queries (INNER or LEFT), is there any performance benefit to foreign keys?
Nope. The JOIN outlined above is an INNER JOIN. I just didn’t add the word. It’s not necessary. It’s about row limits. Adding the INNER keyword wouldn’t change this at all.
[…] Yes, Foreign Keys Help Performance – Grant Fritchey (Blog|Twitter) […]
[…] PowerShell Variables for Admins Who Fear Change Automated Metadata Report for SSAS Tabular Yes, Foreign Keys Help Performance When to Quote in PowerShell How to Confuse the SQL Server Query Optimizer HammerDB Performance […]
the second query is about 60s slower not 60ms
Those numbers are microseconds from Extended Events.
Should Foreign Keys be indexed as well to improve performance?
Good question. Unfortunately, the answer is, it depends. I’ll say they are certainly a great candidate for indexing.
What is the impact if the foreign keys have constraints disabled? Do they still make a (positive) difference to query performance?
Great question. The answer is no. If they are disabled, then they are not trusted and the optimizer can’t use them. Same thing if they’ve been created (or updated) with the NOCHECK option. No trust = No performance enhancements.
Based on the sentence “This occurs because the optimizer knows that the INNER JOIN is going to limit the data coming back to only matching data AND it knows that because there’s an enforced referential constraint, exactly how many rows that will be” can this technique also be leveraged if table A has a FK to table B and table B has a FK (1-to-1) to table C but table B isn’t needed in the query output? For example: table A is Lecture, table B is Professor and table C is Person. In this example, a Professor is a Person and contains columns unique to professors. Also, by having a FK to this table, a column can only contain the ID of a professor rather than “any person.” If a query needs columns from tables A and C (but not B), could it be defined with the JOIN from A to C (skipping B) and benefit from the optimizer “knowing” the structure? Or, should there be 2 JOINs (A to B, then B to C) even though table B isn’t really needed, in order for the benefit to be realized?
I don’t know.
I would have to test it to be sure. However, I believe (and I’m pretty sure I’m right), that leaving tables out of the query means that anything on, or about, those tables, will be ignored by the optimizer. It doesn’t read through all the dependencies to tables not included in the query to attempt to add them, or reference constraints they may have. So, I’d say, no, it doesn’t. However, testing would be the one way to know for certain.
The opposite, elimination of tables not needed to satisfy the query, that is part of the optimization process. And having constraints in place can help with that.
However, as to your question, I just don’t know without setting up the tests.
Thank you for the quick response, especially to an older post. If I can set up a useful test, I’ll share the results.
Please do. I’d love to know for certain. I believe my guess to be right, but I’d rather validate it than suggest something that is blatantly wrong.