Effects of Persisted Columns on Performance

I live for questions. And my favorite questions are the ones where I’m not completely sure of the answer. Those are the questions that make me stop presenting in order to take a note so I can try to answer the question later, usually in a blog post. Guess where we are today?

I was asked at SQL Bits in London about the direct impact of the PERSISTED operator on calculated columns, both inserts and selects. I didn’t have a specific answer, so I wrote it down for later (and asked the, self-described, persisting Dane, to email me to remind me. He did, so I put together a few tests to try to answer his question.

First, I created three tables:

The first table has two calculated columns, one PERSISTED and one not. The second table has a non-persisted column only. The third table has a persisted column only. This way, I can see the behavior across all these tables and compare them to see where PERSISTED adds to, or removes from, performance.

I took a very simple set of queries and ran these a few times on each table to get an average execution time:

The results were:

Persist Persist 2 Persist 3
3580.5 1626 2260.5

On average, across about five executions, you can see a distinct difference of about 600 microseconds between Persist 2 and Persist 3, and an even larger average on the single Persist table. Let’s try the same thing with a few SELECT statements:

While the results are spectular:

Persist Persist 2 Persist 3
115.5 123.5 109.5

You can see that making the calculation on the fly is more costly with both Persist and Persist2 being higher than Persist3, which had no calculations to retrieve data.

Now, let’s run the same thing, but, how about, 10,000 times so we can really see what the differences are over time. First, the inserts:

Persist Persist 2 Persist 3
84854 68783 73817

Still measured in microseconds, you can see that the cumulative totals are higher for the persisted columns than they are for the non-persisted column. So, there is an overhead for storing the extra information. But, what about queries?

Persist Persist 2 Persist 3
Average 221.25 222.16 100.54
Cumulative 2203431 2254238 1005469

I’d say that’s pretty conclusive. When I hit the SELECT statement 10,000 times instead just five or six, we clearly have differences in behavior. Interestingly enough, the logical reads were identical at 20,000 each. So it really did come down to the time spent calculating the values versus the time spent just retrieving them.

Caveats. This is a very simple test with simple calculations and small persisted values. Even doing something 10,000 times isn’t the same as doing it one million or one hundred million. In short, your mileage may vary, some assembly required, for queries lasting longer than four hours call your physician, it depends.

You can see that a PERSISTED column is going to negatively impact INSERT, but, it’s going to positively impact SELECT. The question is, which one is giving you pain at the moment.

UPDATE: Transposed the data in the final table between Persist 2 & Persist 3. Fixed it.

15 thoughts on “Effects of Persisted Columns on Performance

  • Tim Cartwright

    Maybe I am misreading your results table on the selects, but it looks like the persisted tables performed horribly in comparison?

    Persist 2 with no persisted columns took less than half the time versus the persisted columns?:

    Persist Persist 2 Persist 3
    Average 221.25 100.54 222.16
    Cumulative 2203431 1005469 2254238

  • Mark Stacey

    You missed an important test: what happens when you index a persisted column vs indexing a non persisted column? Does a NCI have a value for non persisted, or does it still calculate it?

  • Andrew Whettam

    Thanks Grant, good article. Another comparison that it would be useful to include is the effect of storing the calculated values directly in a field and doing the calculation during the insert. This is quite commonly done when loading a data warehouse, for example. It would be interesting to compare the storage impact as well.

  • Heather

    I am very new to SQL, so this may be a terrible question. Did you account for SSMS’s tendency to try to predict the best execution plan? If so, how much would that impact your performance over time during the query executions?

  • Hi Heather,

    Welcome to SQL Server and thanks for posting on the blog. Great question.

    In this case, the execution plans wouldn’t change the behavior. In the case of the PERSISTED columns, it’s just going to read from the column in a SELECT statement and it’s going to do the calculation at the point of storage for any data modification queries. Neither of these would lead to a different execution plan over time. In the case of the non-persisted columns, same thing in reverse. If you look at the execution plans for the SELECT, you’ll see the calculation occurring there.

    Where you would see differences in the execution plans would be if you were using these columns in the WHERE, ON or HAVING clauses. As Mark mentions above, another set of tests would be to do this with indexes in place on the columns to see the effects there. I’ll see if I can get those tests put together soon.

  • Thomas Franz

    Another scenario to test: what happens, if you have a nonclustered index on RandomValue1 and RandomValue2 but no one on the persisted (or nonpersisted) CalcValue?

    Would it use ignore the persisted column and read just from the NCI (and do the calcualtion) or would it do a clustered index scan / lookup (even if you are just reading e.g. WHERE RandomValue1 < 100) to get the Calc Value.

  • Keith Corser

    Grant several years ago SQLCat was recommending using a table with a tinyint persisted column as a HashID to partition the table for faster very high inserts by eliminating hot pages. Based upon your tests do you feel their suggestion helps or hurts performance?

  • Keith Corser

    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[Message](
    [MessageKey] [bigint] IDENTITY(1,1) NOT NULL,
    [Created] [datetime] NULL,
    [XmlMessage] [xml] NOT NULL,
    [HashID] AS CAST(ABS([MessageKey] % 6) AS TINYINT) PERSISTED NOT NULL)
    GO

    CREATE UNIQUE CLUSTERED INDEX CIX_Hash
    ON [dbo].[Message] ([MessageKey], [HashID]) ON ps_hash([HashID])
    GO

    ALTER TABLE [dbo].[Message] ADD DEFAULT (getdate()) FOR [Created]
    GO

  • Keith,

    To answer for sure, I’d need to set up tests. However, I tend to trust SQLCat. I know a bunch of them and they generally have their stuff in a single bag.

    That’s a very simple calculation, so, while you will pay a little up front on the inserts, it’s not likely to be much. Yeah, I suspect this will probably work relatively well to split the inserts across the partitions defined by the hash. The only thing that would scare me is the only that does scare me in partitions and that’s scans across them. This is probably a good solution for hot pages.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.