Nov 22 2010

Do Foreign Key Constraints Help Performance?

Most people are very aware of the fact that having a foreign key constraint in place on your tables adds overhead to inserts and deletes. And many people believe that there is no benefit to foreign keys beyond referential integrity (which, the application can handle perfectly well, right?). But is that an accurate statement? Here’s the basis for our investigation, a query in AdventureWorks2008R2:

[sourcecode language="sql"]SELECT p.LastName + ‘, ‘ + p.FirstName AS ‘PersonName’
FROM Person.Address AS a
JOIN Person.BusinessEntityAddress AS bea
ON a.AddressID = bea.AddressID
JOIN Person.BusinessEntity AS be
ON bea.BusinessEntityID = be.BusinessEntityID
JOIN Person.Person AS p
ON be.BusinessEntityID = p.BusinessEntityID;[/sourcecode]

This query results in the following execution plan:

ForeignKeyExecPlan

I know that is an ugly query and an ugly query plan, but bear with me for a moment. Do you notice anything about the query and the plan at this point? Count the tables and the access operators. That’s four tables and two access operators. That means, that despite the fact that I listed four tables in the query, the optimizer was smart enough to figure out that it only needed to pull data from two of the tables and completely ignored the others because, the key values in the Person.Person table and the Person.BusinessEntityAddress table were the same. It didn’t even bother with the Address table since nothing from that table is in the WHERE or SELECT and, it skipped the BusinessEntity table because, the foriegn key relationships ensure that the data in the other tables can be trusted.

Don’t believe me? For those trying this at home, run this script in AdventureWorks2008R2:

[sourcecode language="sql"]SELECT *
INTO dbo.MyAddress
FROM Person.Address;

SELECT *
INTO dbo.MyBusinessEntityAddress
FROM Person.BusinessEntityAddress;

SELECT *
INTO dbo.MyBusinessEntity
FROM Person.BusinessEntity;

SELECT *
INTO dbo.MyPerson
FROM Person.Person;[/sourcecode]

Then modify the original query so that it looks like this:

[sourcecode language="sql"]SELECT p.LastName + ‘, ‘ + p.FirstName AS ‘PersonName’
FROM dbo.MyAddress AS a
JOIN dbo.MyBusinessEntityAddress AS bea
ON a.AddressID = bea.AddressID
JOIN dbo.MyBusinessEntity AS be
ON bea.BusinessEntityID = be.BusinessEntityID
JOIN dbo.MyPerson AS p
ON be.BusinessEntityID = p.BusinessEntityID[/sourcecode]

This query will produce the following execution plan:

NoFKExecPlan

That is the exact same query running against exact copies of the tables from the original, yet, now, instead of two operators and one join operation, we’re looking at four operators and three join operations. That my friends is the optimizer taking advantage of the fact that there are foreign keys in place that ensure trust in the data which enables the optimzer to eliminate unnecessary tables from the plan. Not only do the execution plans differ, but the execution time was about 1/6 slower, consistently, in the second query as it did all kinds of reads against the two tables that were eliminated in the original query plan.

I know some of you are saying, “Well, we can just put indexes on the tables to fix that problem, we still don’t need constraints.” OK. Let’s check it out. Here’s a script to put indexes in place, which will surely fix the heinous execution plan above:

[sourcecode language="sql"]ALTER TABLE dbo.MyAddress ADD  CONSTRAINT PK_MyAddress_AddressID PRIMARY KEY CLUSTERED
(
    AddressID ASC
)

CREATE NONCLUSTERED INDEX IX_MyBusinessEntityAddress_AddressID ON dbo.MyBusinessEntityAddress
(
    AddressID ASC
)

ALTER TABLE dbo.MyBusinessEntityAddress ADD  CONSTRAINT PK_MyBusinessEntityAddress_BusinessEntityID_AddressID_AddressTypeID PRIMARY KEY CLUSTERED
(
    BusinessEntityID ASC,
    AddressID ASC,
    AddressTypeID ASC
)

ALTER TABLE dbo.MyBusinessEntity ADD  CONSTRAINT PK_MyBusinessEntity_BusinessEntityID PRIMARY KEY CLUSTERED
(
    BusinessEntityID ASC
)
USE [AdventureWorks2008R2]
GO

ALTER TABLE dbo.MyPerson ADD  CONSTRAINT PK_Person_BusinessEntityID PRIMARY KEY CLUSTERED
(
    BusinessEntityID ASC
)[/sourcecode]

After building out the indexes, we get this execution plan:

NoFkIndexesAddedExecPlan

Oops. Except for the fact that we have Clustered Index Scan instead of Table Scan for most operators, the only real change to this is the inclusion of the Index Scan operation against the new index on the MyBusinessEntityAddress table. In short, no improvement at all.

OK, you might be thinking to yourself, what if this was a proper query and there was a WHERE clause. Let’s modify the queries:

[sourcecode language="sql"]SELECT p.LastName + ‘, ‘ + p.FirstName AS ‘PersonName’
FROM Person.Address AS a
JOIN Person.BusinessEntityAddress AS bea
ON a.AddressID = bea.AddressID
JOIN Person.BusinessEntity AS be
ON bea.BusinessEntityID = be.BusinessEntityID
JOIN Person.Person AS p
ON be.BusinessEntityID = p.BusinessEntityID
WHERE p.LastName LIKE ‘Ran%’

SELECT p.LastName + ‘, ‘ + p.FirstName AS ‘PersonName’
FROM dbo.MyAddress AS a
JOIN dbo.MyBusinessEntityAddress AS bea
ON a.AddressID = bea.AddressID
JOIN dbo.MyBusinessEntity AS be
ON bea.BusinessEntityID = be.BusinessEntityID
JOIN dbo.MyPerson AS p
ON be.BusinessEntityID = p.BusinessEntityID
WHERE p.LastName LIKE ‘Ran%’[/sourcecode]

The first query resulted in this execution plan:

ForeignKeyWhereExecPlan

Now clearly, more tuning is probably possible here since we’re still looking at an Index Scan, but the point is not whether or not the query is tuned, the point is, that the optimizer can eliminate tables because of the trust created by the foreign key constraints. And what did the other query produce?

NoFkIndexesAddedWhereExecPlan

Okay… The less said the better. Once again, tuning opportunities do exist, but we’re still accessing tables where it is absolutely not necessary, as demonstrated by the original query and it’s execution plan. The data returned has been the same, each and every time.

Back to the question, do foreign key constraints help performance? Let me ask you one in return. Have you ever looked to see which was done more in your OLTP, reads or writes? If you haven’t, take a look. I’ll be the answer surprises you. My answer to the question, yes, it can. Not that it will, but that it can.

31 Comments

  • By Robert L Davis, November 22, 2010 @ 8:41 am

    Nice!!

  • By Sandra A Mueller, November 22, 2010 @ 8:54 am

    Awesome Post! I wonder how the OLAP folks will take it — thou that resistance is because of ETL.

  • By Noel McKinney, November 22, 2010 @ 8:57 am

    Thanks Grant, the more articles like this the better, maybe the truth will finally sink in! One former employer had a policy of using foreign keys on development but then remove them on production “because they just slow the system down and the application guarantees integrity.” Needless to say, read activity hugely dominated write activity, plus a good chunk of my DBA time was spent cleaning up bad records inserted by the application.

  • By jon g, November 22, 2010 @ 8:57 am

    Excellent post.

  • By Michelle Ufford, November 22, 2010 @ 8:59 am

    Nice post, Grant! Thanks for that. :)

  • By Grant Fritchey, November 22, 2010 @ 9:04 am

    Thanks everyone. Struck a chord with this one evidently. But, it’s true, even very high write OLTP systems are usually more read than write, so stuff like this can, and will, matter.

  • By Brian Garraty, November 22, 2010 @ 9:52 am

    Excellent post Grant. When I grow up, I want to write blogs posts like this.

  • By Pat Wright, November 22, 2010 @ 10:56 am

    Great post Grant. Really helpful!

  • By Grant Fritchey, November 22, 2010 @ 1:41 pm

    Brian, don’t wait until you grow up. If I had to wait that long I still wouldn’t be blogging (since I still haven’t quite grown up). And thanks.

    Thanks Pat.

  • By Meredith Ryan-Smtih, November 22, 2010 @ 2:08 pm

    Excellent post! thanks for the clear demonstration of what could be a confusing topic.

  • By Wendy, November 22, 2010 @ 2:14 pm

    Pure SQL #awesomesauce Grant – Thank you for clarifying this in plain english!

  • By Grant Fritchey, November 22, 2010 @ 3:03 pm

    Thanks Wendy, Meredith.

  • By Dave Mulanaphy, November 22, 2010 @ 8:43 pm

    Great post Grant. First off let me say I completely agree with you about having FK’s. That being said it made me think a little bit about a system that does have really high volumes of writes. What are you thoughts about creating a plan guide with FK’s in place, removing them and using the plan guide to get the performance gain? I know it could be a maintenance nightmare. I better go shower after that last thought…

  • By Grant Fritchey, November 22, 2010 @ 8:50 pm

    Hey Dave,

    That’s actually not a bad idea. I’m not sure that it would work, and you’re still possibly going to run into integrity issues, but that could work… although… if I recall correctly, the plan guide can’t force the optimizer to do things it can’t do, like scan a table that’s not part of the query, that sort of thing… without the FK… I suppose I have another blog post now. Thanks.

  • By Johan Bijnens - ALZDBA, November 23, 2010 @ 4:30 am

    Once again you’ve proven that taking general guidelines into account pays off.
    The engine really takes these things into account and uses it to its – and finally your – advantage.
    “Tell your system what you know. e.g. PK, AK, FK”

    Thanks.

  • By Grant Fritchey, November 23, 2010 @ 6:40 am

    Thanks Johan. Glad you liked it.

  • By Mark Shay, November 26, 2010 @ 10:50 am

    Great Post!

  • By Johan Bijnens - ALZDBA, November 27, 2010 @ 3:24 pm

    Maybe the obvious is to obvious to get noticed, but people need to keep in mind that having contraints declared using “with nocheck” will only serve a documentation purpose. SQLServer knows that constraints “with nocheck” are not to be trusted, so it will not be able to optimze as shown in the execlent article.
    It takes time, but eventually we will get there ;-)

  • By Kev Riley, December 8, 2010 @ 4:51 am

    Great article!
    As ALZDBA states, you need to make sure your foreign keys are ‘trusted’ – check the is_not_trusted column in sys.foreign_keys – if this isn’t set, the execution plan will take the same route as the non-FK query.
    To set them simply ALTER TABLE … WITH CHECK CHECK CONSTRAINT …

    One final note: trusted is not the same as enabled – the FK can be enabled, but still untrusted – all that means is that SQL Server hasn’t verified it.

    Another great post on the trustworthy nature of FKs by the Datachix at http://datachix.com/2010/09/09/can-you-trust-your-foreign-key-or-feature-821-i-didnt-already-know-about-sql-server/

  • By David Wetherell, December 8, 2010 @ 3:35 pm

    This is something really simple that can make a big difference, great explanation as well.

  • By Muthukkumaran Kaliyamoorthy, May 2, 2011 @ 9:45 am

    Grant,
    Its great post.

    Grant you always “GRANT” (SQL GRANT)something to floks.

    That’s y your parents named you grant.

  • By Grant Fritchey, May 4, 2011 @ 1:10 pm

    Thanks! That’s a good one.

  • By sathish kumar, July 6, 2011 @ 2:31 am

    Hi,

    Understood well about using foreign key , thanks a lot for the article..

  • By JMoffitt, March 12, 2012 @ 3:24 pm

    And to be realistic, when in the history of relational databases has an application ever succeeded in enforcing the data integrity better than or even as well as the database could do it? I’ve used hundreds (maybe even thousands when you consider data imports where I see the data but not the orginal database) of different databases and I have never yet seen one that had data integrity if the FK relationship was not in place in the database. It is too easy to get around the application. And putting checks only in the application doesn’t consider that other data affecting activities will not always go through the data layer, even if you think they will.

  • By Steve Searle, April 12, 2012 @ 12:03 am

    Grant, curious to know, if you add the predicate “bea.AddressID <= 1000" to your queries, why does the optimizer no longer eliminate the Person.Address table from the execution plan? I would've thought that the trust created by the foreign key would still hold in this case, but it seems not.

  • By Sachit, April 19, 2012 @ 5:35 am

    Thanks for this article Grant. But I have one question, you said after building indexes also on other set of tables dbo.MyAddress, dbo.MyBusinessEntityAddress, dbo.MyBusinessEntity and dbo.MyPerson, you got execution plan with tables not really required. But I observed that when you create unique clustered indexes or primary keys and exclude creation of foreign keys, still you get perfect execution plan with only minimal (desired) set of tables. Could you please recheck?

  • By Mani, November 2, 2012 @ 2:54 pm

    Simply awesome, good Explanation! – Never feed wrong inputs to Optimizer, you query will hung!

Other Links to this Post

  1. SQL Server Links and news for the week 26/11/10 | John Sansom - SQL Server DBA in the UK — November 26, 2010 @ 8:27 am

  2. The Twelve Days of SQL Series | Brent Ozar - Too Much Information — December 8, 2010 @ 8:16 am

  3. Foreign Keys Help Performance « Voice of the DBA — May 2, 2011 @ 7:33 am

  4. Are Foreign Keys Better Than Indexes? | Home Of The Scary DBA — September 13, 2013 @ 10:30 am

RSS feed for comments on this post. TrackBack URI

Leave a comment