May 05 2015

Effects of Persisted Columns on Performance

I live for questions. And my favorite questions are the ones where I’m not completely sure of the answer. Those are the questions that make me stop presenting in order to take a note so I can try to answer the question later, usually in a blog post. Guess where we are today?

I was asked at SQL Bits in London about the direct impact of the PERSISTED operator on calculated columns, both inserts and selects. I didn’t have a specific answer, so I wrote it down for later (and asked the, self-described, persisting Dane, to email me to remind me. He did, so I put together a few tests to try to answer his question.

First, I created three tables:

CREATE TABLE dbo.PersistTest (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

CREATE TABLE dbo.PersistTest2 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2)
);
GO

CREATE TABLE dbo.PersistTest3 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

The first table has two calculated columns, one PERSISTED and one not. The second table has a non-persisted column only. The third table has a persisted column only. This way, I can see the behavior across all these tables and compare them to see where PERSISTED adds to, or removes from, performance.

I took a very simple set of queries and ran these a few times on each table to get an average execution time:

INSERT dbo.PersistTest (
RandomValue1,
RandomValue2) VALUES (
42.2 , -- RandomValue1 - decimal
44.4  -- RandomValue2 - decimal
);

The results were:

Persist Persist 2 Persist 3
3580.5 1626 2260.5

On average, across about five executions, you can see a distinct difference of about 600 microseconds between Persist 2 and Persist 3, and an even larger average on the single Persist table. Let’s try the same thing with a few SELECT statements:

SELECT  pt.CalcValue,
        pt.PersistValue
FROM    dbo.PersistTest AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.CalcValue
FROM    dbo.PersistTest2 AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.PersistValue
FROM    dbo.PersistTest3 AS pt
WHERE   pt.PersistTestID = 3;
GO

While the results are spectular:

Persist Persist 2 Persist 3
115.5 123.5 109.5

You can see that making the calculation on the fly is more costly with both Persist and Persist2 being higher than Persist3, which had no calculations to retrieve data.

Now, let’s run the same thing, but, how about, 10,000 times so we can really see what the differences are over time. First, the inserts:

Persist Persist 2 Persist 3
84854 68783 73817

Still measured in microseconds, you can see that the cumulative totals are higher for the persisted columns than they are for the non-persisted column. So, there is an overhead for storing the extra information. But, what about queries?

Persist Persist 2 Persist 3
Average 221.25 222.16 100.54
Cumulative 2203431 2254238 1005469

I’d say that’s pretty conclusive. When I hit the SELECT statement 10,000 times instead just five or six, we clearly have differences in behavior. Interestingly enough, the logical reads were identical at 20,000 each. So it really did come down to the time spent calculating the values versus the time spent just retrieving them.

Caveats. This is a very simple test with simple calculations and small persisted values. Even doing something 10,000 times isn’t the same as doing it one million or one hundred million. In short, your mileage may vary, some assembly required, for queries lasting longer than four hours call your physician, it depends.

You can see that a PERSISTED column is going to negatively impact INSERT, but, it’s going to positively impact SELECT. The question is, which one is giving you pain at the moment.

UPDATE: Transposed the data in the final table between Persist 2 & Persist 3. Fixed it.

May 01 2015

Speaker of the Month: May 2015

I finally got out to some community events last month and got to see a bunch of great speakers. Not that you don’t see great speakers at paid events, you do. It’s just that they’re usually professional speakers and I want to encourage others when I can.

The Speaker of the Month for May 2015 is John Sterrett (b|t) and his presentation Automate Your Daily Checklist With PBM and CMS at SQL Saturday Boston.

The very first thing that impressed me about this presentation was how John entered the room and immediately started interacting with the attendees. You see a lot of speakers hiding behind the lectern, fiddling with cables, nervously sipping water. Not John. He was chatting as he set up, immediately getting control of the room. It’s a great way to start. Then, it was a very informative lecture. He showed how to use Policy Based Management and the Central Management Server together to create really meaningful monitoring and double checks on the servers. It’s a technique I’ve used, so I really enjoyed seeing it presented. Plus, I learned a few things. For example, I hadn’t realized you can run the Best Practices Analyzer from PBM. His slides were clear and informative. A little too much documentation for my taste, but I know some people love that type of deck, so I’m not going to ding people for it. His demos were very clear and illustrative of the points he was making. John handled questions well and liberally and carefully used Zoomit to make everything clear. It was a fantastic presentation that was very well delivered.

A couple of things I wasn’t entirely crazy about. John kept checking the slides on the projected screen. Come to find out, it was because he couldn’t see them on his laptop. OK, I forgive that. While he was pretty good at repeating the questions, a few times he forgot. It’s so easy to do when you’re speaking (guilty). Also, John didn’t have a necessary dongle for connecting his machine. Never assume you’ll have VGA/HDMI because one of them will always be wrong. But that’s it.

John’s presentation manner is a lot like talking to him. He’s real calm, looks you right in the eye, listens carefully to questions and then gives direct answers. It was a real pleasure to watch. I suggest checking out his sessions when he presents next.

Apr 27 2015

Benefits for Some, All or Only a Few

As a member of the PASS Board of Directors I attended the PASS Business Analytics Conference (BAC) recently. You can read more about it here and here (as well as here).

Let me start with an important note: I am voicing my opinion here as an individual, not an official stance of the PASS organization.

There is controversy around the BAC because of a whole bunch of things, but one question in particular bothered me. It was suggested that the people attending the BAC were just consuming the worth or value that other people who paid for the Summit generated. At first, I just dismissed this concept. It stuck in the back of my mind though. Suddenly I realized why.

Yes, the BAC was partly paid for by Summit. The attendees at the BAC were not all people who would have attended Summit. There were, maybe, 1/3, who have attended Summit, are going to attend Summit, or who might attend Summit. That means, a majority will not.

So?

Money from Summit is used to support Chapters. Anyone ever canvassed their attendees at a local user group for who has gone or will go to Summit? I have. Most of the time, far less than 1/3. Do we cut funding for Chapters?

Money from Summit is used to support SQL Saturday. Once again, I’ve canvassed several of these for people who were going to be attending Summit. Again, way less than 1/3. No more funding for SQL Saturday?

How about the Virtual Chapters that money from Summit pays for? How many of those people are attending Summit? I don’t know, but I’d be shocked if it’s 100% or anything close to that. Are we cutting Virtual Chapters?

24 Hours of PASS is also paid for by Summit.

You know, everything that PASS does, whether you like it, and attend it, or not, is paid for by Summit. There are good arguments to be made that we should not be doing the BAC (and arguments that we should). Where the money comes from is absolutely not a part of that argument. Otherwise, we must pull funding from anything and everything that is done by PASS that doesn’t translate to 100% benefits for the people who paid for it, Summit attendees.

I believe that we, the members of PASS, should be open and accepting and willing to try new things, both from a technical perspective and from a personal one. Providing training and community is what we do. Let’s focus on that.

Apr 23 2015

PASS Board Update: April 2015

It’s been a pretty interesting month on the board.

First, we did have a little problem. I’m sure one or two of you may have noticed that the SQL Saturday web site was down for a moment. Well, three days. Joking aside, this was a serious problem, but my involvement was largely peripheral since I’m in charge of Chapters. I tried to help out a little, offering what support I could and, if nothing else, supplying an ear, willing to listen. Quite a few people took advantage of that. I communicated all their information up the chain to HQ and the other board members. Nothing has been held back.

Next, we’ve started the budgeting process. That’s fascinating. As well as a giant pain in the… well, anyway. Thankfully the people at HQ are helping shepherd me through the process.

We’ve had a number of leadership changes at different PASS Chapters. A couple of new groups have been created. We’re making some progress there.

Also this month I went to the PASS Business Analytics Conference.

Let’s talk about this.

The event was really well put together. The keynotes were amazing. You can read about them here and here.  The venue was great. The speakers I saw seemed good (for the most part). We hosted a number of focus group discussions with attendees and speakers to get feedback on the event. Overall, they were extremely happy with it. They also provided a lot of great suggestions to help improve the event if we do it again.

And there is the question. Do we do this again?

I’m not sure.

If you saw my tweets during the event, I was very supportive. I was also actively promoting the event in the weeks leading up to it. All this is part of my responsibilities as a board member. We were committed to an event and I’m going to help the organization ensure that event is successful. Period.

However, if you had asked me whether I would support doing this again next year, prior to going to the event, I would have said no. Now, I’m not sure. The fact is, there’s a split between the perfect BAC audience member and the perfect Summit audience member. That’s not saying that there’s not crossover. There very much is. But, my concern is, can we, the PASS organization, properly support a community that is business focused using the tools we have for supporting a technology focused community? Should we?

From all the feedback, we supplied an event that people liked and cared about and, most importantly, would suggest to others. So we probably can support a business focused community. Should we?

For myself, and all this is just me, reporting to you, I think that the big question is, does this help our existing membership. Prior to this event, I would have said absolutely not. I would have argued that there was not a path from DBA to analyst. After talking to lots and lots of people at the event, I found that a healthy number of our peers have moved from being a DBA or BI person to being a data analyst. It goes something like “Hey! You know that data stuff, right? Well we need someone to do this data analysis thingymabob and you just volunteered.”

In short, many data analysts are accidental data analysts and they come from our peers, our community, in fact, our #sqlfamily. Now, should we do this thing? I’m on the fence still, but I don’t mind admitting, I’m wavering towards “Hell, Yes.”

Apr 21 2015

Query Hash Formula Changes Between Versions

The question came up on SQL Server Central that someone wanted to track queries on their 2008R2 instance and on their 2014 instance in order to validate performance after an upgrade. The plan was to use query hash values. I was curious, so I set up a quick test. First, I spun up two VMs in Azure, both A1, both Enterprise, but one for 2008R2 and one for 2014. Then, I grabbed a generic query that runs against DMVs:

SELECT TOP 10
SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1,
(CASE deqs.statement_end_offset
WHEN -1 THEN DATALENGTH(dest.text)
ELSE deqs.statement_end_offset
- deqs.statement_start_offset
END) / 2 + 1) AS querystatement,
deqp.query_plan,
deqs.query_hash,
deqs.execution_count,
deqs.last_elapsed_time,
deqs.last_logical_reads,
deqs.last_logical_writes,
deqs.last_worker_time,
deqs.max_elapsed_time,
deqs.max_logical_reads,
deqs.max_logical_writes,
deqs.max_worker_time,
deqs.total_elapsed_time,
deqs.total_logical_reads,
deqs.total_logical_writes,
deqs.total_worker_time
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
ORDER BY deqs.total_elapsed_time DESC;

I ran the query and captured execution plans on both servers. No changes, just defaults. The output is here:

0x40B8F425DDC3D692 –2014
0x958B3949E630C004 –2008R2

I don’t have access to the code at Microsoft, but I think it’s pretty safe to say that the formula for creating the hash value for a given query has changed between versions. This isn’t surprising, but it is a little disappointing. It would have meant an easy way to compare performance between queries on two different systems. Now, you’ll have to compare based on the T-SQL text which just isn’t as easy and clean.

Apr 16 2015

Azure SQL Database v12 and SQL Magazine

I spend many of my evenings researching and writing. Sometimes it’s writing new books. Sometimes it’s fixing and rewriting old books. Occasionally it’s a blog post like this one. Lately, it’s been a series of articles for SQL Magazine that are all about the new functionality available in Azure SQL Database v12 that was released in February for most data centers in Azure. It’s a whole new ball game. Check out my introductory article for v12 and the one on DMVs in Azure. I have more coming up on CLR in Azure, getting started, PowerShell, DBCC, T-SQL enhancements, Premium Tier and more. I’ll also get into “old” functionality like backup and restore. I’ll also explore new functionality, Azure is nothing if not dynamic, as becomes widely available.

I know a lot of you have been putting off exploring Azure, especially Azure SQL Database until it was “ready.” Guess what? It’s that time. Feel free to lean on me here, and over at SQL Mag, for questions, suggestions, thoughts, concerns, anything you have about Azure.

Apr 13 2015

Azure SQL Database Firewall Settings

The new portal for managing Azure is pretty. I’m not sure I’m in love with it, but it’s pretty.

However, one thing that I have to access regularly is the firewall settings for my Azure SQL Database. I do demos from all over the place. I never know what my IP address is going to be. Rather than expose everything, I just set up whatever IP address I’m on and then remove it later. The old portal made this easy. The new one… not so much.

So, let’s get this down real quick. Assuming you connect to the new portal and go straight to your database, you’ll see this image showing you the database and the server it’s on:

DatabaseAndServer

You won’t see anything else that suggests FIREWALL. But, you can click on the server. When you do, you’ll see another panel open up to the right of the one you’re on. It’ll have an image like this:

Server

Still nothing that screams FIREWALL, but if you click on the little gear icon that says SETTINGS you’ll get yet another panel opening to the right that resembles this image:

ServerSettings

There she blows, Firewall. Click on that and you’ll see the standard screen for editing your IP address access into the firewall:

ServerSettingsFirewall

Just remember that after adding a new IP address to your firewall you must hit the save icon at the top of the screen, or you still won’t be able to access your database.

Apr 10 2015

I Am Grateful For

I decided in January that I would write regularly about people that I’m grateful for. Now it’s April. Oops.

The concepts are simple and science based. Positive emotions help your clarity of thought. Further, your moods and actions extend out through your network to the third layer. With that in mind, I want to think more clearly and the most direct path to positive thoughts and emotions being gratitude, I’m attempting to focus and publicize my gratitude by publicly sharing it through these blogs (in short, I’m trying to tune my brain like I would tune a query).

I am grateful for Tim Ford (b|t).

Tim is a great guy. Further, Tim can be patient with thick headed dorks (raising my hand). For example, among all the other stuff Tim does (Board of Directors for PASS, Editor at SQL Mag, Track organizer at Connections), he organizes and runs SQL Cruise. Yeah, yeah, cue the laugh track. You don’t get it. SQL Cruise changes peoples lives. I’ve seen it. Now, first time I went, I had a blast, but, I’m a bit thick, so I didn’t understand everything I was seeing. What I knew was that I didn’t tweet or write a blog post for a week, therefor I must not be doing my job. So, I asked Red Gate to not send me on any more cruises. We skipped them for a while. Last year, with a lot of prodding from Kevin Kline (another person I’m grateful for, but one blog post at a time), I went on the cruise again. Yeah, sun, rum, beaches… Oh, and people doing intense networking, redirecting their careers, going through intense, long form sessions with great speakers (and me)… Whoa! I spent some time talking with Tim and his wife Amy and the light finally dawned. Tim isn’t ONLY finding a way to get a bunch of us out on a boat so we can have fun (yeah, that’s part of it), he’s literally building and engineering opportunities for people to recharge and reset in preparation for radically modifying their career (not their jobs, people after SQL Cruise have careers). And it’s done the same thing for me. I’ve been personally reinvigorated by the work that Tim has done and I’m immensely grateful for that. But it’s not just SQL Cruise. Tim has helped me personally and professionally because he’s just that sort of person. He’s one of a long list of people that I’m regularly gobsmacked are willing to hang out with me.

Thank you Tim.

Apr 07 2015

Error: Unknown Property ismemoryoptimized

If you’re starting the process of moving your databases in Azure SQL Databast to v12, you need to do one thing. Make sure you have SQL Server 2014 CU5 or better installed on your machine with Management Studio (current list of updates). Otherwise, like me, you’ll be looking at the above error.

Just a quick blog post to help out. I saw this error, did a search, and didn’t hit a single entry telling me what to do. I started the install of CU6 (I needed to catch up on cumulative updates anyway). While that was happening, I went to Twitter and posted to #sqlhelp to see if anyone else had hit this. I got a response from Amit Banarjee pointing me to this MSDB blog on the topic, reinforcing the decision I had already made. I just wish they had posted the error along with everything else in the blog post. It would make things easier.

Apr 06 2015

Constraints and SELECT Statements

I’ve posted previously about how a foreign key constraint can change how a SELECT query behaves. Logically that just makes sense. But other types of constraints don’t affect execution plans do they?

Yes.

Let’s take this constraint as an example:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK 
ADD  CONSTRAINT CK_SalesOrderDetail_UnitPrice 
CHECK  ((UnitPrice>=(0.00)))

That will ensure that no values less than zero can slip in there. We can even validate it:

INSERT Sales.SalesOrderDetail
        (SalesOrderID,
         CarrierTrackingNumber,
         OrderQty,
         ProductID,
         SpecialOfferID,
         UnitPrice,
         UnitPriceDiscount,
         rowguid,
         ModifiedDate
        )
VALUES  (60176, -- SalesOrderID - int
         N'XYZ123', -- CarrierTrackingNumber - nvarchar(25)
         1, -- OrderQty - smallint
         873, -- ProductID - int
         1, -- SpecialOfferID - int
         -22, -- UnitPrice - money
         0.0, -- UnitPriceDiscount - money
         NEWID(), -- rowguid - uniqueidentifier
         GETDATE()  -- ModifiedDate - datetime
        );

Will give me an error:

Msg 547, Level 16, State 0, Line 470
The INSERT statement conflicted with the CHECK constraint “CK_SalesOrderDetail_UnitPrice”. The conflict occurred in database “AdventureWorks2014″, table “Sales.SalesOrderDetail”, column ‘UnitPrice’.

Let’s look at a SELECT query now. If we run this:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.';

The resulting execution plan looks like this:

Constraint

But, if I modify the query to look like this, adding an additional AND filter on the constrained UnitPrice column:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.'
        AND sod.UnitPrice > $0.0;

You know what happens to the execution plan? Nothing. It stays exactly the same. The optimizer knows that in order to satisfy the query, it can safely ignore the change in the WHERE clause. In fact, you can look at the SELECT operator properties for the two different plans and note that while the Query Hash values changes, the Plan Hash value stays the same. The plans are identical.

With that knowledge, I’m going to modify the query to look like this, reversing the UnitPrice reference to look for data that violates the constraint:

SELECT  soh.OrderDate,
        soh.ShipDate,
        sod.OrderQty,
        sod.UnitPrice,
        p.Name AS ProductName
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   p.Name = 'Water Bottle - 30 oz.'
        AND sod.UnitPrice < $0.0;

And now we have a new execution plan:

Constraint_scan

 

The optimizer recognized that there is no way that any data can be returned with the WHERE clause above because there is an enforced constraint (note the use of the WITH CHECK clause on the constraint). This completely changes the execution plan in every possible way. Now, instead of attempting to access the data, a Constant Scan operator is put in as a place holder for an empty result set.

To sum up, yes, constraints absolutely affect the choices made by the optimizer when those constraints would have an affect on the plan, even a SELECT query. Also, I would argue, this means that the use of enforced constraints can be a performance enhancer since the optimizer can make intelligent choices about how a given query is dealt with.

UPDATE: Fixed a typo that said the constraint prevented data that was equal to or less than zero. It’s only for data less than zero.