Effects of Persisted Columns on Performance

05May 2015 by Grant Fritchey 15 Comments

I live for questions. And my favorite questions are the ones where I’m not completely sure of the answer. Those are the questions that make me stop presenting in order to take a note so I can try to answer the question later, usually in a blog post. Guess where we are today?

I was asked at SQL Bits in LondonÂ about the direct impact of the PERSISTED operator on calculated columns, both inserts and selects. I didn’t have a specific answer, so I wrote it down for later (and asked the, self-described, persisting Dane, to email me to remind me. He did, so I put together a few tests to try to answer his question.

First, I created three tables:

CREATE TABLE dbo.PersistTest (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

CREATE TABLE dbo.PersistTest2 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2)
);
GO

CREATE TABLE dbo.PersistTest3 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

The first table has two calculated columns, one PERSISTED and one not. The second table has a non-persisted column only. The third table has a persisted column only. This way, I can see the behavior across all these tables and compare them to see where PERSISTED adds to, or removes from, performance.

I took a very simple set of queries and ran these a few times on each table to get an average execution time:

INSERT dbo.PersistTest (
RandomValue1,
RandomValue2) VALUES (
42.2 , -- RandomValue1 - decimal
44.4  -- RandomValue2 - decimal
);

The results were:

Persist	Persist 2	Persist 3
3580.5	1626	2260.5

On average, across about five executions, you can see a distinct difference of about 600 microseconds between Persist 2 and Persist 3, and an even larger average on the single Persist table. Let’s try the same thing with a few SELECT statements:

SELECT  pt.CalcValue,
        pt.PersistValue
FROM    dbo.PersistTest AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.CalcValue
FROM    dbo.PersistTest2 AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.PersistValue
FROM    dbo.PersistTest3 AS pt
WHERE   pt.PersistTestID = 3;
GO

While the results are spectular:

Persist	Persist 2	Persist 3
115.5	123.5	109.5

You can see that making the calculation on the fly is more costly with both Persist and Persist2 being higher than Persist3, which had no calculations to retrieve data.

Now, let’s run the same thing, but, how about, 10,000 times so we can really see what the differences are over time. First, the inserts:

Persist	Persist 2	Persist 3
84854	68783	73817

Still measured in microseconds, you can see that the cumulative totals are higher for the persisted columns than they are for the non-persisted column. So, there is an overhead for storing the extra information. But, what about queries?

	Persist	Persist 2	Persist 3
Average	221.25	222.16	100.54
Cumulative	2203431	2254238	1005469

I’d say that’s pretty conclusive. When I hit the SELECT statement 10,000 times instead just five or six, we clearly have differences in behavior. Interestingly enough, the logical reads were identical at 20,000 each. So it really did come down to the time spent calculating the values versus the time spent just retrieving them.

Caveats.Â This is a very simple test with simple calculations and small persisted values. Even doing something 10,000 times isn’t the same as doing it one million or one hundred million. In short, your mileage may vary, some assembly required, for queries lasting longer than four hours call your physician, it depends.

You can see that a PERSISTED column is going to negatively impact INSERT, but, it’s going to positively impact SELECT. The question is, which one is giving you pain at the moment.

UPDATE: Transposed the data in the final table between Persist 2 & Persist 3. Fixed it.

15 thoughts on “Effects of Persisted Columns on Performance”

Brent Ozar

There can also be an impact on CHECKDB, too: http://scorellis.com/?p=294

May 5, 2015 at 10:20 am
Reply
Grant Fritchey

Excellent! I didn’t know that at all. Thanks for sharing.

May 5, 2015 at 10:26 am
Reply
Tim Cartwright

Maybe I am misreading your results table on the selects, but it looks like the persisted tables performed horribly in comparison?

Persist 2 with no persisted columns took less than half the time versus the persisted columns?:

Persist Persist 2 Persist 3
Average 221.25 100.54 222.16
Cumulative 2203431 1005469 2254238

May 5, 2015 at 3:53 pm
Reply
Grant Fritchey

Fixed it. Twice. I shouldn’t be allowed out without a keeper.

May 5, 2015 at 6:17 pm
Reply
Mark Stacey

You missed an important test: what happens when you index a persisted column vs indexing a non persisted column? Does a NCI have a value for non persisted, or does it still calculate it?

May 6, 2015 at 2:42 am
Reply
Grant Fritchey

Good point Mark. May have to have another blog post.

May 6, 2015 at 5:42 am
Reply
Andrew Whettam

Thanks Grant, good article. Another comparison that it would be useful to include is the effect of storing the calculated values directly in a field and doing the calculation during the insert. This is quite commonly done when loading a data warehouse, for example. It would be interesting to compare the storage impact as well.

May 13, 2015 at 6:08 am
Reply
Heather

I am very new to SQL, so this may be a terrible question. Did you account for SSMS’s tendency to try to predict the best execution plan? If so, how much would that impact your performance over time during the query executions?

May 13, 2015 at 8:45 pm
Reply
Grant Fritchey

Hi Heather,

Welcome to SQL Server and thanks for posting on the blog. Great question.

In this case, the execution plans wouldn’t change the behavior. In the case of the PERSISTED columns, it’s just going to read from the column in a SELECT statement and it’s going to do the calculation at the point of storage for any data modification queries. Neither of these would lead to a different execution plan over time. In the case of the non-persisted columns, same thing in reverse. If you look at the execution plans for the SELECT, you’ll see the calculation occurring there.

Where you would see differences in the execution plans would be if you were using these columns in the WHERE, ON or HAVING clauses. As Mark mentions above, another set of tests would be to do this with indexes in place on the columns to see the effects there. I’ll see if I can get those tests put together soon.

May 14, 2015 at 2:50 am
Reply
(SFTW) SQL Server Links 15/05/15 - John Sansom

[…] Effects of Persisted Columns on PerformanceÂ –Â Grant Fritchey (Blog|Twitter) […]

May 15, 2015 at 3:01 am
Reply
John Tunnicliffe

Nice post!

May 15, 2015 at 5:12 am
Reply
Thomas Franz

Another scenario to test: what happens, if you have a nonclustered index on RandomValue1 and RandomValue2 but no one on the persisted (or nonpersisted) CalcValue?

Would it use ignore the persisted column and read just from the NCI (and do the calcualtion) or would it do a clustered index scan / lookup (even if you are just reading e.g. WHERE RandomValue1 < 100) to get the Calc Value.

May 18, 2015 at 4:38 am
Reply
Keith Corser

Grant several years ago SQLCat was recommending using a table with a tinyint persisted column as a HashID to partition the table for faster very high inserts by eliminating hot pages. Based upon your tests do you feel their suggestion helps or hurts performance?

May 20, 2015 at 8:07 pm
Reply
Keith Corser

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Message](
[MessageKey] [bigint] IDENTITY(1,1) NOT NULL,
[Created] [datetime] NULL,
[XmlMessage] [xml] NOT NULL,
[HashID] AS CAST(ABS([MessageKey] % 6) AS TINYINT) PERSISTED NOT NULL)
GO

CREATE UNIQUE CLUSTERED INDEX CIX_Hash
ON [dbo].[Message] ([MessageKey], [HashID]) ON ps_hash([HashID])
GO

ALTER TABLE [dbo].[Message] ADD DEFAULT (getdate()) FOR [Created]
GO

May 20, 2015 at 8:17 pm
Reply
Grant Fritchey

Keith,

To answer for sure, I’d need to set up tests. However, I tend to trust SQLCat. I know a bunch of them and they generally have their stuff in a single bag.

That’s a very simple calculation, so, while you will pay a little up front on the inserts, it’s not likely to be much. Yeah, I suspect this will probably work relatively well to split the inserts across the partitions defined by the hash. The only thing that would scare me is the only that does scare me in partitions and that’s scans across them. This is probably a good solution for hot pages.

May 21, 2015 at 6:38 am
Reply