Function Vs. Performance

Recently I was looking through DBA.StackExchange when I saw a pretty simple question that I decided to answer. I went off, set up a test database and some test tables and quickly wrote a query to answer the question. I got it all formatted pretty and was on my way to post it when I saw that another answer was already there.

Yeah. Identical to mine. Almost line for line.

Well, nuts.

I know. I’ll write a blog post.

The Setup

The original poster had two tables that, frankly, are badly designed. However, they share enough data that they are “related” if not relational. Here’s the code:

CREATE DATABASE Testing;
GO
USE Testing;
GO

CREATE TABLE Table_A
(
    ID INT IDENTITY(1, 1),
    Score INT
);

CREATE TABLE Table_B
(
    FromPoint INT,
    ToPoint INT,
    RankDesc VARCHAR(50)
);

INSERT dbo.Table_A
(
    Score
)
VALUES
(67),
(569),
(123);

INSERT dbo.Table_B
(
    FromPoint,
    ToPoint,
    RankDesc
)
VALUES
(   1,        -- FromPoint - int
    99,       -- ToPoint - int
    'Top 100' -- RankDesc - varchar(50)
    ),
(   100,      -- FromPoint - int
    499,      -- ToPoint - int
    'Top 500' -- RankDesc - varchar(50)
),
(   500,       -- FromPoint - int
    1000,      -- ToPoint - int
    'Top 1000' -- RankDesc - varchar(50)
);

So the query they wanted was to join the tables and categorize the values from Table_A to the From & To in Table_B to find the rank. So I wrote this query:

SELECT ta.ID,
       tb.RankDesc
FROM dbo.Table_A AS ta
    JOIN dbo.Table_B AS tb
        ON ta.Score
           BETWEEN tb.FromPoint AND tb.ToPoint;

Fully functional. However, you query tuning nuts are already screaming. I can hear them. Let’s talk about it.

Function vs. Performance

Let’s please avoid the premature optimization discussion and just say, sometimes, we can spot things that will be issues, and fixing them before they are issues, sure doesn’t hurt anything.

Here’s the existing execution plan:

No real shocks or surprises in the basic plan shape. Nothing in the structure above supports this query, so we’re seeing table scans (and I didn’t make anything the PK or clustered index, so that’s all on me). The only interesting bit is in the Nested Loops Predicate:

[Testing].[dbo].[Table_A].[Score] as [ta].[Score]>=[Testing].[dbo].[Table_B].[FromPoint] as [tb].[FromPoint] AND [Testing].[dbo].[Table_A].[Score] as [ta].[Score]<=[Testing].[dbo].[Table_B].[ToPoint] as [tb].[ToPoint]

Even though I wrote “BETWEEN”, the optimizer chose to change it to >= and <=. Same difference really, but if you wanted to, ever so slightly, reduce the work the optimizer does, I could rewrite the join like this:

SELECT ta.ID,
       tb.RankDesc
FROM dbo.Table_A AS ta
    JOIN dbo.Table_B AS tb
        ON ta.Score >= tb.FromPoint
           AND ta.Score <= tb.ToPoint;

Over thousands of executions, that might shave a millisecond off the time. No, we need indexes.

Indexes

As with so much of query tuning, the right structure makes the difference. Probably, I need to add a clustered index to these tables to really do them right. However, we’re just going to try a couple of indexes.

First though, I’m going to load up more data so that we’re on multiple pages and the optimizer will spot that indexes I create will help. I’m going to cheat & use Data Generator (hey, I work for Redgate, I’m going to be lazy when I can be). With a lot of data, the execution plan changes:

That done, let’s try this index:

CREATE INDEX ScoreTableA ON dbo.Table_A (Score) INCLUDE (ID);

I intentionally made it covering. Here’s the resulting execution plan:

Well that certainly looks better, but execution plans are NOT a measure of performance. What are the performance metrics, before and after adding this index:

Before:
Reads: 20312
Duration: 27.153ms (side note, I tested w/BETWEEN and it was, on average, 100 mcs slower over 100 executions, so, there's that)

After:
Reads: 44
Duration: 3.76ms

Well, that’s pretty dispositive. I think I can safely say that index helped. 88% faster with 99.998% fewer reads. Seems like a no brainer to me.

Now, are we worried about the table scan of Table_B? In this exact situation, with three rows, and a query without a WHERE clause, I’d say no. However, testing is your buddy. Let’s create this index:

CREATE INDEX FromAndTo
ON dbo.Table_B (
                   FromPoint,
                   ToPoint
               )
INCLUDE (RankDesc);

That changes the execution plan:

function vs. performance, function wins

Performance now looks like this:

Reads: 40
Duration: 2.057ms

It helps a bit. Not earth shattering, but, still. Honestly, I’d call this index, premature optimization. In fact, it might be unnecessary optimization depending on how frequently this stuff gets called, changes to the data over time, etc.. Another thing to look at, for both indexes, is possibly making them clustered (arguments to be had since the one on Table_A may not be unique), but I’ll let you test that one on your own.

Conclusion

I was, initially, much more interested in creating functional code. But then, when I couldn’t post it as an answer, because someone beat me to it, I decided I had to do something with it. Why not tune it?

Now, all of query tuning isn’t about adding indexes. However, when you can look at a bit of code, along with the structure, and easily, quickly, spot that it’s going to perform badly without an index, well, adding one just makes sense. When it’s down to function vs. performance, sometimes I’d say that it’s not functioning until it reaches a minimum level of performance. We went from 20,312 reads to 44 then 40. That’s not merely premature optimization. That’s functionally correct.

2 thoughts on “Function Vs. Performance

  • Sometimes we don’t have a choice with how our tables are designed such as with 3d party DB’s that we can add to (ie custom views, tables) but that we are not allowed to edit directly like fixing bad table designs. I am in that exact position of having to work with what is there and often when posting questions to get help I get (unfairly in my opinion) flack about the design like as if I was the one who did it and it still happens even when I preface that it’s not my design nor one I can change.

    Not saying you are doing that Grant, just pointing out it happens and can be a good way to drive some away from seeking help so best to always understand the person may have no choice but to work with the design as it is.

    • Oh right there with you. I get it. We seldom have much control over in-house built stuff, let alone 3rd party. You’re 100% correct.

      Sadly, that doesn’t change the fact that, at the root, structures and code are the biggies when it comes to performance. Indexing can help (as shown), but it’s not always going to be an easy win.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.