Database Fundamentals #26: The Primary Key War

There is a war about primary keys in the database world. There are two camps. The first camp believes that primary keys should only ever be created on meaningful information. For example, there is an ISO standard for the abbreviation of state names in the United States. You could create a table for looking up state names and make the primary key that abbreviation because it is guaranteed to be unique. The other camp believes that primary keys should never be created on meaningful information because, meaningful information is subject to change and you don’t want your primary keys to be changing. A changing primary key means changing all the tables that are related to that value. This camp believes that all primary keys should be artificial. As far as I’m concerned, both camps are 100% correct. My own personal preference is towards the artificial key camp, but I have built systems with natural keys and they can work just fine. The examples we’re using in these blog posts are for artificial keys, but that is just me staying in my comfort zone. I am not attempting to advocate for one camp or the other. I’ll leave it to others to have that fight. I just want to make you aware that you will see both types in different databases. You may see both types in the same database. In upcoming examples, I will even show you something that is a combination of both types.

One point in favor of natural keys, just because you have an artificial key, you’re not done. You still must define the natural key and set it up as a candidate, or alternate, key in order to be sure that you’re meeting the business requirements for uniqueness as well as simply uniquely identifying the row. If you do not do this, then it will be possible to violate business rules such as having more than one state abbreviation with identical values. So, while artificial keys are easier to set up and maintain, and can perform better, you have the added overhead of having to also set up and maintain the natural key anyway. Don’t ever forget this point if you do choose to work with artificial keys.

One point in favor of artificial keys, data compliance through mechanisms like the GDPR requires you to change the data going to non-production environments so that privacy is maintained. A natural key could also be personally identifying information. Having to change all primary key and foreign key values across your database could be a massive undertaking. Artificial keys, used appropriately, completely avoid this issue.

Conclusion

Artificial vs. Natural keys is not a hill I wish to die on. Nor should you. There are so many more important things to worry about when building your database. The next post in the series will be on the scripts necessary to define primary keys.

2 thoughts on “Database Fundamentals #26: The Primary Key War

  • Sean Redmond

    In the case of something not expected to change soon (or possibly marginally in our life times), like the states in the U.S., either artificial or natural keys (say the official two-letter code) would do me fine, although I would take the artificial key as a matter of habit. This also applies to other enumeration-type tables that are generally very static.

    What is more important to me is the datatype chosen. If it is to be an artificial key then TINYINT, if it is to be a natural key, then CHAR(2). This key could be well be used in a table with a 100 million rows and then every byte in the primary key adds on extra 100 million bytes. That is an extra 12,207 pages *per byte* or en extra 36,621 pages if the developer has set the datatype to INT on the grounds of old habits.

    And, in this case, if there is a database with a 100-million row table that has state foreign key field, there is a case to be made for the 2-byte CHAR(2) — it may very well save many needless joins in queries when only the 2-character state code is required — against the smaller number of pages saved with the TINYINT primary key.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.