Yes, Foreign Keys Help Performance

I created this example several years ago that illustrates how foreign key constraints can help performance. It’s a contrived example. Granted. I feel like it illustrates the point.

However, over the years, people have questioned one aspect of it. The optimizer uses the foreign keys to figure out which tables can be eliminated from the query, making for a more efficient plan and making the query run faster. The pushback has always been, “Yeah, Grant, but nobody writes T-SQL where they include extra tables that they don’t need.”

My initial response, after I stop laughing, is to point out any number of ORM tools. But, you know what, let’s assume that’s correct. No one would ever create a giant catch-all view that has all their JOINs in one place so they don’t have to write them for different queries. Stuff like that never happens. Everyone will only, ever, write a query based on the exact tables they need, so this simplification process of the optimizer is a bad example. I agree.

Please go read through the other post for all the details of the set up. I’m just going to focus on one query in this post.

First, in the example, the SELECT only references the Person table and there is no WHERE clause. So we could completely eliminate all the other tables. But, I still want to illustrate how foreign keys help, so let’s assume, for some reason, we need the JOIN (yes, contrived, but it’s going to illustrate the point, again):

SELECT  p.LastName + ',' + p.FirstName AS PersonName
FROM person.BusinessEntityAddress AS bea
JOIN Person.Person AS p
ON p.BusinessEntityID = bea.BusinessEntityID;
GO
SELECT  p.LastName + ',' + p.FirstName AS PersonName
FROM dbo.MyBusinessEntityAddress AS bea
JOIN dbo.MyPerson AS p
ON p.BusinessEntityID = bea.BusinessEntityID;
GO

The first query goes against a pair of tables with enforced referential constraints. The other query goes against an identical set of tables (indexes are all the same, so is the data and the statistics), but no foreign keys (the final point at the old blog post). If we capture performance metrics on these two queries we see the following:

Execution Time (Avg ms) Reads
Foreign Key 134095.6667 155
No Foreign Key 193937.6667 3929

On average, the second query is about 60ms slower and has 18 times as many reads. Why? Let’s look at the execution plans:

ForeignKeysHelpPerformance

The only difference between these tables is the existence of foreign key constraints, which leads to differences in the estimated number of rows. This occurs because the optimizer knows that the INNER JOIN is going to limit the data coming back to only matching data AND it knows that because there’s an enforced referential constraint, exactly how many rows that will be. Because of this knowledge, it chooses an execution plan that is more efficient.

Foreign keys help performance in SQL Server.

16 thoughts on “Yes, Foreign Keys Help Performance

  • Dave Wentzel

    I believe the caveat is that the FK must be trusted. I did a script to prove this years ago so it might have changed in recent versions and I don’t work with SQL Server anymore to repro it. Another great reason to use DRI.

  • Michael Rybicki

    Doesn’t specifying and INNER JOIN on the second query essentially do the same thing? And if so, and I do that in all my queries (INNER or LEFT), is there any performance benefit to foreign keys?

    • Great question. The answer is no. If they are disabled, then they are not trusted and the optimizer can’t use them. Same thing if they’ve been created (or updated) with the NOCHECK option. No trust = No performance enhancements.

  • Greg

    Based on the sentence “This occurs because the optimizer knows that the INNER JOIN is going to limit the data coming back to only matching data AND it knows that because there’s an enforced referential constraint, exactly how many rows that will be” can this technique also be leveraged if table A has a FK to table B and table B has a FK (1-to-1) to table C but table B isn’t needed in the query output? For example: table A is Lecture, table B is Professor and table C is Person. In this example, a Professor is a Person and contains columns unique to professors. Also, by having a FK to this table, a column can only contain the ID of a professor rather than “any person.” If a query needs columns from tables A and C (but not B), could it be defined with the JOIN from A to C (skipping B) and benefit from the optimizer “knowing” the structure? Or, should there be 2 JOINs (A to B, then B to C) even though table B isn’t really needed, in order for the benefit to be realized?

    • I don’t know.

      I would have to test it to be sure. However, I believe (and I’m pretty sure I’m right), that leaving tables out of the query means that anything on, or about, those tables, will be ignored by the optimizer. It doesn’t read through all the dependencies to tables not included in the query to attempt to add them, or reference constraints they may have. So, I’d say, no, it doesn’t. However, testing would be the one way to know for certain.

      The opposite, elimination of tables not needed to satisfy the query, that is part of the optimization process. And having constraints in place can help with that.

      However, as to your question, I just don’t know without setting up the tests.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.