Everyone knows that you only get a single clustered index, right? Wouldn’t it be great though if you could have two clustered indexes?
Well, you can. Sort of. Let’s talk about it.
Two Clustered Indexes
First I’m going to create a table:
DROP TABLE IF EXISTS dbo.od; GO SELECT pod.PurchaseOrderID, pod.PurchaseOrderDetailID, pod.DueDate, pod.OrderQty, pod.ProductID, pod.UnitPrice, pod.LineTotal, pod.ReceivedQty, pod.RejectedQty, pod.StockedQty, pod.ModifiedDate INTO dbo.od FROM Purchasing.PurchaseOrderDetail AS pod;
With that in place, let’s start with a clustered index:
CREATE CLUSTERED INDEX TestCIndex ON od (ProductID);
And, a query to test with:
SELECT od.PurchaseOrderID, od.PurchaseOrderDetailID, od.DueDate, od.OrderQty, od.ProductID, od.UnitPrice, od.LineTotal, od.ReceivedQty, od.RejectedQty, od.StockedQty, od.ModifiedDate FROM dbo.od WHERE od.ProductID BETWEEN 500 AND 510 ORDER BY od.ProductID;
This results in the following execution plan:
OK. Well done, Grant. That’s how a clustered index works. The data is sorted and stored at the leaf level based on the key column or columns. So a query like this where the filtering criteria is on the clustered key, and the ordering criteria is on the clustered key, with no additional filtering required, gets a plan that’s an ordered seek on the clustered index. You widget, you.
Fine. Let’s drop that index, and create a new one:
DROP INDEX IF EXISTS TestCIndex ON dbo.od CREATE INDEX TestNCIndex ON dbo.od (ProductID) INCLUDE ( PurchaseOrderDetailID, DueDate, OrderQty, PurchaseOrderID, UnitPrice, LineTotal, ReceivedQty, RejectedQty, StockedQty, ModifiedDate );
If we run our query again, this is now the execution plan:
Because we have stored all the columns of the table as INCLUDE columns at the leaf level of the nonclustered index, we effectively have a second clustered index.
Now, don’t get too excited, let’s talk about it.
It’s Not a Clustered Index
First up, I didn’t mention performance above, so let’s mention it now. The clustered index performed as follows:
The nonclustered index:
If we round, these ran in the same amount of time, 3 milliseconds. Yeah, there’s a 6% variance there over 100 executions, but that’s pretty small, especially when talking microseconds. So we can say, they ran in the same amount of time.
What about the reads?
Let’s look at one other query, against each index:
SELECT i.NAME, i.type_desc, s.page_count, s.record_count, s.index_level FROM sys.indexes i JOIN sys.dm_db_index_physical_stats(DB_ID(N'AdventureWorks'), OBJECT_ID(N'dbo.od'), NULL, NULL, 'DETAILED') AS s ON i.index_id = s.index_id WHERE i.OBJECT_ID = OBJECT_ID(N'dbo.od');
We get the results back and the nonclustered index consists of 84 pages while the clustered index consists of 87. What the heck?
Well, look, the nonclustered index is not a clustered index. You can tell because the results for the query above are going to show just the pages for the clustered index, or, they’re going to show the nonclustered index pages as well as the pages for the heap table. Yeah, just because we’re defining an index that includes all the data, doesn’t mean the data is still not stored somewhere. How many pages are in the heap? 87.
I didn’t get into the weeds on this, but I’m pretty sure it has to do with the row locator. See, for a heap, you store the RID along with the row. For a clustered index, of course, you don’t do that… but… this isn’t a unique clustered index, so along with the key, we need a uniquifier to give us a row locator. Hence, just a little bigger, with the possibility for just slightly more reads.
Again, don’t get excited. I’m absolutely not suggesting that, ta-da, you get two, or more, clustered indexes for free. There are serious ramifications here. You’re doubling (or close to it), your storage. You’re adding overhead to every single write operation. TANSTAAFL certainly applies here.
However, if you’re willing to pay the cost, yeah, you can have two clustered indexes. Effectively. Kind of.