Never, Ever Use Clustered Indexes

This whole concept of the clustered index as a foundational structure within SQL Server is just plain nuts. Sure, I get the concept that if a table has a clustered index, then that index actually becomes the table. When you create a clustered index on a table, the data is now stored at the leaf level of the Balanced Tree (b-tree) page distribution for that index, and I understand that retrieving the data using a seek on that index is going be extremely fast because no additional reads are necessary. Unlike what would happen with a non-clustered index on a heap table.

Yes, I get that if I store my data in a heap, the only way to access the data is through the Index Allocation Mapping (IAM)  pages that define extents and this means that I don’t get the double-linked list of pages that occur within clustered indexes. I know that having to read the IAM leads to additional reads for a heap to look up within the IAM in order to find the locations of the data on the disk.

I realize that updating or deleting a clustered index is helped by being able to use the index itself to find the exact row that needs to be modified or removed. I’ve also seen the tests that show that clustered indexes work faster on inserts in the overwhelming majority of situations within SQL Server. But I still want you to stop using clustered indexes on all your tables within SQL Server. Why? Because that’s how Oracle databases are mostly designed.

I hope you’ve figured out by now that I’m joking about tossing out clustered indexes within your SQL Server databases. I do believe that, unless you have a very thoroughly tested exception, every table within SQL Server should have a clustered index for some of the reasons that I’ve listed above, as well as several others. But Oracle DBAs design their systems differently.

When I see a vendor that makes a product that is exactly the same on Oracle, SQL Server, and possibly DB2 or MySQL, I have to ask myself, just how well is that system going to perform. When I hear someone tell me to design the system using lowest common denominator T-SQL because “we don’t want to be locked into a particular vendor” I have to wonder, again, how are we going to make this system perform. Because, if Oracle likes heaps, but SQL Server likes clusters, how do you design for both? I’d say you can’t.

In fact, I’d argue that you need to design precisely for specific relational database management systems because, let’s face it, they don’t implement the fundamentals in the same way. If you mess up the fundamentals, you’ve just messed up your entire design.

21 thoughts on “Never, Ever Use Clustered Indexes

  • Narayana

    Then can we ever justify the use of ORM in anything other than RAD ? Or atleast ORMs when used with a view of switching DB vendors at a later point in time?

  • Brandon Leach

    Great post Grant!

    Naranya, I think ORMs are great for smaller applications. However with larger ones they can cause a lot of problems with performance.

    If portability is the issue then abstracting away the database via interfaces may be better. That way you can create concrete classes that implement those interfaces for each database platform. You can then load up the correct implementation based on say a config file. This way your application cares not what the underlying DB platform is. It also means that to add support for a new DB platform you only need to create a new interface implementation.

  • Todd Everett

    Grant:

    This post is SPOT ON! As a Data Architect I have to work with 3 different platforms – SQL Server, DB2, and Oracle. You really sum up what I often face in that we have a tendency to design database schema’s at the logical level only and not take into consideration the fundamentals of each individual platform. But doing so is really tough if you have responsibility for multiple platforms as keeping up with just one well enough to perform effective design is a full time job!

    Your post prompted me to do a google search on “oracle clustered index” as I know that in DB2 you have “clusterING” indexes, where DB2 tries to keep the data in the table space ordered like the “clusterING” index, but if it can’t find space on the page where the row should go it looks left and right 16 pages and puts it there, and if it still can’t find space puts it at the end. I wondered about Oracle. The first hit was this link: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:586423377841. Its fascinating as its your point in reverse – someone wanting to build an oracle DB like a sybase DB. Tom’s statement that caught my eye was “To me, it is better to understand what Oracle has to offer (b*tree clusters, hash clusters, single
    table hash clusters, sorted hash clusters, index organized tables, heap tables, external tables,
    partitioned tables, ……….) … but don’t ever try to do the “in sybase you do this, in oracle you do that”, there are just way way
    way too many caveats and you’ll end up with a really poorly performing application” This guy Tom is right there with you.

    So do you think we are getting to a point in specialization where we need to have a data architect who focuses on logical schema design (tables, columns, and constraints) and works with a “[Platform] Database Architect” who focuses on how to make that particular logical schema work best in his or her platform of focus? Do you see anyone doing this in your travels?

    Thanks again for the great thought provoking post!

    -Todd

  • Grant,
    You are right on the mark in terms of designing the database for a specific platform. If you look at the ANSI SQL standard for instance…it is this big honking document but the part that says whether a database is compliant is a small reference in the back of Appendix B that is soooo shallow that anyone could create an ANSI complaint system in an afternoon. So NONE of systems are totally standard compliant and developing some cross-database complaint design that is also performant is a myth.
    As for the ORM questions earlier in the comments, ORMs are performant or not based on how they are implemented in the code. So how the developer interacts with the generated code determines a large part of how well it behaves itself. Truth be told, some ORMs like EF4 allow you to generate POCO models and with proper coding of a repository layer or such you can abstract away the fact that you are using an ORM in the background at all.

    AJ

  • Thanks to everyone for their comments.

    As to the ORM tools. I have no issues with ORM tools. I think they are amazing and can be used to great effect to speed up the development process. But… Too many of the projects that I’ve seen that use ORMs are not Mapping between an Object design and a Relational design. They have been used to shoehorn objects straight into the database and then used lowest common denominator SQL. These have had horrific performance. Not merely bad performance, but practically standing still performance that made the projects fail for the business.

    So, if your approach to ORM is to ignore the relational system that you’re mapping to, then yeah, you may be falling into this trap. If you’re still acknowledging that you’re dealing with a relational system, regardless of whether it’s Oracle, DB2, SQL Server, whatever, and you’re building a database and using the Mapping aspect of ORM to connect to your back-end from your front-end, you’re fine.

  • Tod,

    Yeah, I’ve been that guy. I’ve worked on systems where we designed an Oracle database & a SQL Server database side-by-side, but each one did it “their way.” There were fights (I refuse to abide by 30 character naming limits doggone it), but overall it worked well.

  • John Blaylock

    Clustered indexes, more specifically which columns to use for a clustered index, is something I could debate for hours, The benefit of clustering on columns that are used in range searched vs an identity column.

    From my experience, I rarely do any queries with range searches on an identity column, typically it’s a query for a specific row and surely a unique non clustered index will work fine (covered if necessary).

    The issue I see with not clustering on an identity column is the page spliting and fragmentation due to lots of inserts.

  • HAhaha. I just LOVE this post.If is I who say this, and I really think this a long time, I am punished to death.Fortunately now I have a post from a MVP Guru to give me back when I said it again.. Grant I love you 🙂

  • For a very brief moment I thought someone had hacked your blog! 🙂

    I will point this post to you-know-who-all and hope they understand that..just like Java is not the same as .NET although both are OO languages, SQL Server needs a different approach than Oracle in the RDBMS world.

  • Stephanie B

    “how do you design for both? I’d say you can’t.”

    Tom Kyte makes the same point in his excellent book –
    Expert Oracle Database Architecture

  • Stephanie B

    To quote Tom Kyte:

    “By now, you might be able to see where I’m going in this section. I have made references above to other
    databases and how features are implemented differently in each. With the exception of some read-only
    applications, it is my contention that building a wholly database-independent application that is highly
    scalable is extremely hard — it is, in fact, quite impossible unless you know exactly how each database
    works in great detail. And, if you knew how each database worked in great detail, you’d understand that
    database independence is not something you really want to achieve (a very circular argument!).”

    Tom Kyte -Expert Oracle Database Architecture Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions

  • That’s excellent Stephanie. Thank you. In a different format I was castigated on this point of view by others that said “Well, of course, this is something you expect from Microsoft people.” Nice to know I’m not alone.

  • Stephanie B

    I agree 100% with both you and Tom Kyte.

    Unfortunately I have seen some disastrous real-life results of someone applying that what may be good practice for Oracle to SQL Server, and vice versa.

  • spydon

    Why wouldn’t oracle allow clustered indexing? They do, they just call it Indexed Organized Tables (IOT). (They also did at the time you were writing this)

  • Thanks for the feedback. I didn’t say that they didn’t allow them. I said that in the majority of cases they aren’t used. It’s just not as big a deal in Oracle as it is in SQL Server. Jonathan Lewis, an Oracle Ace, and I even had a long discussion about the differences in use and importance of the clustered index on the two platforms: http://www.red-gate.com/products/oracle-development/deployment-suite-for-oracle/education/webinars/webinar-oracle-heaped-tables-sql-clustered-indexes.htm

    You’re right. Oracle absolutely supports it.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.