View vs. Table Valued Function vs. Multi-Statement Table Valued Function

About five years ago, I was checking an app before it went to production. I hadn’t seen the app before then and a junior dba had worked with the developers designing and building the app. It didn’t use a single stored procedure or view. Instead, it was built entirely of multi-statement UDF’s. These UDF’s called other UDF’s which joined to UDF’s… It was actually a very beautiful design in terms of using the functions more or less like objects within the database. Amazing. It also would not, and could not, perform enough to function, let alone scale. It was a horror because they thought they were done and ready to go to production, but no one had ever tested more than a couple of rows of data in any of the tables. Of course, a couple of rows of data worked just fine. It was when we put in 10, 1000, a few million, that the thing came to a total and complete halt. We spent weeks arguing about the stupid thing. The developers instisted that since it was “possible” to do what they did, that, in fact, it was OK to do what they did.

Anyway, with the help of a Microsoft consultant, we finally cleaned up the app and got it on it’s feet. Ever since then, I’ve preached the dangers of the multi-statement table valued function. The thing to remember is, there are no statistics generated for these things. That means the optimizer thinks they return a single row of data. When they do only return a few rows, everything is fine. When they return even as little as a hundred rows, like the example I’m posting below, they stink.

Anyway, I boiled up this silly example because some developer accused me and several other DBA’s of spreading Fear, Undertainty, and Doubt because we suggested that the multi-statement UDF is something to avoid if possible. Actually, he pretty much all but stated that we didn’t know what we were talking about. I was peeved. Hence this example. Feel free to check it out. Oh, and if you check the execution plans, note that the multi-statement UDF is marked as the least costly even though it actually performs twice as slow as the others. One more example of execution plans being wrong.

Here are the time results from one run of the view & UDF’s:

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.

(99 row(s) affected)

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.

And the code to test for yourself:

CREATE TABLE dbo.Parent
(ParentId int identity(1,1)
,ParentDate datetime)

CREATE TABLE dbo.Child
(ChildId int identity(1,1)
,ParentId int
,ChildDate datetime)

DECLARE @i int
DECLARE @j int
SET @i = 1
SET @j = 1
WHILE @i < 100
BEGIN
INSERT INTO dbo.Parent
(ParentDate)
SELECT GETDATE()
WHILE @j < 100
BEGIN
INSERT INTO dbo.Child
(ParentId
,ChildDate)
SELECT @i
,GETDATE()
SET @j = @j + 1
END
SET @i = @i + 1
END

CREATE VIEW dbo.vJoin
AS
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId

CREATE FUNCTION dbo.SingleUDF ()
RETURNS TABLE
AS
RETURN
(
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
)

CREATE Function dbo.MultiUDF ()
RETURNS @Multi TABLE
(ParentId int
,ParentDate datetime
,ChildId int
,ChildDate datetime)
AS
BEGIN
INSERT INTO @Multi
(ParentId
,ParentDate
,ChildId
,ChildDate)
SELECT p.ParentId
,p.ParentDate
,c.ChildId
,C.ChildDate
FROM dbo.Parent p
JOIN dbo.Child c
ON p.ParentId = c.ParentId
RETURN
END

set statistics time on
select * from vJoin
select * from SingleUDF()
select * from MultiUDF()
set statistics time off

UPDATE -- Edited the formatting.

18 thoughts on “View vs. Table Valued Function vs. Multi-Statement Table Valued Function

  • Actually, I’m in the multi-statement-table-function crowd, but with a caveat: I always specify a primary key on the table to be returned.

    With the primary key in place, I find these functions to be significantly faster than views or table-valued functions, especially as the data set grows larger.

  • Am I missing something… all have the same reading ?
    Does this nullify the statement above ?

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    (99 row(s) affected)

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    (99 row(s) affected)

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.

    (99 row(s) affected)

  • Okay now this was with a 1098 … udf win so far …
    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 3 ms.

    (1098 row(s) affected)

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 2 ms.

    (1098 row(s) affected)

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 2 ms.

    (1098 row(s) affected)

  • Okay so my cpu bit the dust bit stil it would seem that udf is faster ? Or am I missing something ? Maybe with computer having more and more ram udf might actualy be better ?

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 109 ms, elapsed time = 882 ms.

    (101097 row(s) affected)

    SQL Server Execution Times:
    CPU time = 125 ms, elapsed time = 844 ms.

    (101097 row(s) affected)

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 589 ms.

    (101097 row(s) affected)

  • tof

    Hi there,

    Great article! I know you wrote this a while ago, but there was a little bug in the code. You forgot to reset @j in the outer loop which made the query insert only the first 100 records in the table. I fixed that and created a table with about a million records in it. Please find the results below:

    View
    —-
    CPU time = 1482 ms, elapsed time = 10633 ms
    Table ‘Child’. Scan count 1, logical reads 3281
    sub-tree cost: 11.24 (19%)
    inline UDF
    ———-
    CPU time = 1201 ms, elapsed time = 11551 ms.
    Table ‘Child’. Scan count 1, logical reads 3281
    sub-tree cost: 11.24 (19%)

    Multi-valued UDF
    —————–
    CPU time = 6256 ms, elapsed time = 27981 ms.
    Table ‘#1273C1CD’. Scan count 1, logical reads 4118, physical reads 63, read-ahead reads 3745
    sub-tree cost: 36.76 (62%)

  • Another interesting observation. If I Display Estimated Execution Plan on the three select statements, I get the vJoin as 26%, SingleUDF as 26% and MultiUDF as two queries of 0% and 47% respectively. The last query plan shows all the query parts. If I use Include Actual Plan, I get three query plans of 50%, 50% and 0%.

  • SAinCA

    Perhaps you’d re-do this with more real-world configurations…

    Maybe, as most of us tend to code, you’d use local variable IDs for Parent and Child and re-code the UDFs to be parameterized, then run the full-set, parent-set and parent-child-explicit test cases with far more data than 100 rows…

    For example, minus the extra data and trying to remove bias by running the selects in different sequences to try to minimize the impact of the cache/fetch (likely could be vastly improved upon):

    BEGIN TRAN
    SET NOCOUNT on
    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MultiUDF]’) AND type in (N’FN’, N’IF’, N’TF’, N’FS’, N’FT’))
    DROP FUNCTION [dbo].[MultiUDF]
    GO

    CREATE TABLE dbo.Parent
    (ParentId int identity(1,1)
    ,ParentDate datetime)

    CREATE TABLE dbo.Child
    (ChildId int identity(1,1)
    ,ParentId int
    ,ChildDate datetime)

    DECLARE @i int
    DECLARE @j int
    SET @i = 1
    SET @j = 1
    WHILE @i < 100
    BEGIN
    INSERT INTO dbo.Parent
    (ParentDate)
    SELECT GETDATE()
    WHILE @j < 100
    BEGIN
    INSERT INTO dbo.Child
    (ParentId
    ,ChildDate)
    SELECT @i
    ,GETDATE()
    SET @j = @j + 1
    END
    SET @i = @i + 1
    END
    go
    CREATE VIEW dbo.vJoin
    AS
    SELECT p.ParentId
    ,p.ParentDate
    ,c.ChildId
    ,C.ChildDate
    FROM dbo.Parent p
    JOIN dbo.Child c
    ON p.ParentId = c.ParentId
    go
    CREATE FUNCTION dbo.SingleUDF (@ParentID int, @ChildID int)
    RETURNS TABLE
    AS
    RETURN
    (
    SELECT p.ParentId
    ,p.ParentDate
    ,c.ChildId
    ,C.ChildDate
    FROM dbo.Parent p
    JOIN dbo.Child c
    ON p.ParentId = c.ParentId
    WHERE p.ParentID = ISNULL(@ParentID,p.ParentID)
    AND c.ChildID = ISNULL(@ChildID,c.ChildID)
    )
    go
    CREATE Function dbo.MultiUDF (@ParentID int, @ChildID int)
    RETURNS @Multi TABLE
    (ParentId int
    ,ParentDate datetime
    ,ChildId int
    ,ChildDate datetime)
    AS
    BEGIN
    INSERT INTO @Multi
    (ParentId
    ,ParentDate
    ,ChildId
    ,ChildDate)
    SELECT p.ParentId
    ,p.ParentDate
    ,c.ChildId
    ,C.ChildDate
    FROM dbo.Parent p
    JOIN dbo.Child c
    ON p.ParentId = c.ParentId
    WHERE p.ParentID = ISNULL(@ParentID,p.ParentID)
    AND c.ChildID = ISNULL(@ChildID,c.ChildID)
    RETURN
    END
    go
    DECLARE @ParentID int
    DECLARE @ChildID int
    set statistics time on
    select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
    set statistics time off
    SET @ParentID = 1
    set statistics time on
    select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
    set statistics time off
    SET @ChildID = 50
    set statistics time on
    select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
    set statistics time off
    RAISERROR(‘Done VIEW P and C’,0,1) WITH NOWAIT
    SET @ParentID = NULL
    SET @ChildID = NULL
    set statistics time on
    select * from SingleUDF(@ParentID,@ChildID)
    set statistics time off
    SET @ParentID = 1
    set statistics time on
    select * from SingleUDF(@ParentID,@ChildID)
    set statistics time off
    SET @ChildID = 50
    set statistics time on
    select * from SingleUDF(@ParentID,@ChildID)
    set statistics time off
    RAISERROR(‘Done SingleUDF P and C’,0,1) WITH NOWAIT
    SET @ParentID = NULL
    SET @ChildID = NULL
    set statistics time on
    select * from MultiUDF(@ParentID,@ChildID)
    set statistics time off
    SET @ParentID = 1
    set statistics time on
    select * from MultiUDF(@ParentID,@ChildID)
    set statistics time off
    SET @ChildID = 50
    set statistics time on
    select * from MultiUDF(@ParentID,@ChildID)
    set statistics time off
    RAISERROR(‘Done MultiUDF P and C’,0,1) WITH NOWAIT

    ROLLBACK
    RAISERROR(‘************************************************************************’,0,1) WITH NOWAIT
    BEGIN TRAN
    SET NOCOUNT on
    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MultiUDF]’) AND type in (N’FN’, N’IF’, N’TF’, N’FS’, N’FT’))
    DROP FUNCTION [dbo].[MultiUDF]
    GO

    CREATE TABLE dbo.Parent
    (ParentId int identity(1,1)
    ,ParentDate datetime)

    CREATE TABLE dbo.Child
    (ChildId int identity(1,1)
    ,ParentId int
    ,ChildDate datetime)

    DECLARE @i int
    DECLARE @j int
    SET @i = 1
    SET @j = 1
    WHILE @i < 100
    BEGIN
    INSERT INTO dbo.Parent
    (ParentDate)
    SELECT GETDATE()
    WHILE @j < 100
    BEGIN
    INSERT INTO dbo.Child
    (ParentId
    ,ChildDate)
    SELECT @i
    ,GETDATE()
    SET @j = @j + 1
    END
    SET @i = @i + 1
    END
    go
    CREATE VIEW dbo.vJoin
    AS
    SELECT p.ParentId
    ,p.ParentDate
    ,c.ChildId
    ,C.ChildDate
    FROM dbo.Parent p
    JOIN dbo.Child c
    ON p.ParentId = c.ParentId
    go
    CREATE FUNCTION dbo.SingleUDF (@ParentID int, @ChildID int)
    RETURNS TABLE
    AS
    RETURN
    (
    SELECT p.ParentId
    ,p.ParentDate
    ,c.ChildId
    ,C.ChildDate
    FROM dbo.Parent p
    JOIN dbo.Child c
    ON p.ParentId = c.ParentId
    WHERE p.ParentID = ISNULL(@ParentID,p.ParentID)
    AND c.ChildID = ISNULL(@ChildID,c.ChildID)
    )
    go
    CREATE Function dbo.MultiUDF (@ParentID int, @ChildID int)
    RETURNS @Multi TABLE
    (ParentId int
    ,ParentDate datetime
    ,ChildId int
    ,ChildDate datetime)
    AS
    BEGIN
    INSERT INTO @Multi
    (ParentId
    ,ParentDate
    ,ChildId
    ,ChildDate)
    SELECT p.ParentId
    ,p.ParentDate
    ,c.ChildId
    ,C.ChildDate
    FROM dbo.Parent p
    JOIN dbo.Child c
    ON p.ParentId = c.ParentId
    WHERE p.ParentID = ISNULL(@ParentID,p.ParentID)
    AND c.ChildID = ISNULL(@ChildID,c.ChildID)
    RETURN
    END
    go
    DECLARE @ParentID int
    DECLARE @ChildID int
    set statistics time on
    select * from SingleUDF(@ParentID,@ChildID)
    set statistics time off
    SET @ParentID = 1
    set statistics time on
    select * from SingleUDF(@ParentID,@ChildID)
    set statistics time off
    SET @ChildID = 50
    set statistics time on
    select * from SingleUDF(@ParentID,@ChildID)
    set statistics time off
    RAISERROR(‘Done SingleUDF P and C’,0,1) WITH NOWAIT
    SET @ParentID = NULL
    SET @ChildID = NULL
    set statistics time on
    select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
    set statistics time off
    SET @ParentID = 1
    set statistics time on
    select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
    set statistics time off
    SET @ChildID = 50
    set statistics time on
    select * from vJoin WHERE ParentID = ISNULL(@ParentID,ParentID) AND ChildID = ISNULL(@ChildID,ChildID)
    set statistics time off
    RAISERROR(‘Done VIEW P and C’,0,1) WITH NOWAIT
    SET @ParentID = NULL
    SET @ChildID = NULL
    set statistics time on
    select * from MultiUDF(@ParentID,@ChildID)
    set statistics time off
    SET @ParentID = 1
    set statistics time on
    select * from MultiUDF(@ParentID,@ChildID)
    set statistics time off
    SET @ChildID = 50
    set statistics time on
    select * from MultiUDF(@ParentID,@ChildID)
    set statistics time off
    RAISERROR(‘Done MultiUDF P and C’,0,1) WITH NOWAIT

    ROLLBACK

    The results on the 100 rows for the SingleUDF are nits different from the qualified View SELECTs.

    Thanks for raising to points you eloquently narrate – it’s made me re-evaluate whether some of the code I have is optimal. My concern with views is the degree to which data are assembled prior to a WHERE condition being applied, hence the "more data than 100 rows" request.

    Cheers!

    • I’m not sure what happened to the formatting. It’s been fine until I saw these new comments come in. I’m working on it.

      It looks like it’s just the comments that are messing up. It’s messed up in IE and Chrome.

      In response to your tests, mine were only to make a point. If I set up real world tests with 10,000+ rows, so we can see how data & structures & statistics really work with multi-statement UDFs, and then nest and JOIN the UDFs as they were originally, these points become much more clear. But the thing is, there are demonstrable (albeit extremely minor) differences at extremely low numbers of rows.

  • GregJF

    I got the same result as you did the first time I created and ran these queries.
    Then I put the queries into another query window and ran them again with stats on and execution plan on
    Stats:
    set statistics time on
    select * from vJoin –TOOK:CPU time = 0 ms, elapsed time = 52 ms.
    select * from SingleUDF() –TOOK:CPU time = 0 ms, elapsed time = 36 ms.
    select * from MultiUDF() –TOOK:CPU time = 0 ms, elapsed time = 52 ms.
    set statistics time off

    Exec Plan:
    select * from vJoin –TOOK:47%
    select * from SingleUDF() –TOOK:47%
    select * from MultiUDF() –TOOK:6%

    I then took SAinCA advise and added a primary key to the MultiUDF function, like

    ALTER Function dbo.MultiUDF ()
    RETURNS @Multi TABLE
    (ParentId int
    ,ParentDate datetime
    ,ChildId int
    ,ChildDate datetime,
    PRIMARY KEY ( ParentId,ChildId ))
    –first 7 lines shown for brevity

    Then I ran the queries again, TWICE
    the first was akin the the first with no PK
    On the second:

    Stats:
    set statistics time on
    select * from vJoin –TOOK:CPU time = 0 ms, elapsed time = 81 ms.
    select * from SingleUDF() –TOOK:CPU time = 0 ms, elapsed time = 46 ms.
    select * from MultiUDF() –TOOK:CPU time = 0 ms, elapsed time = 79 ms.
    set statistics time off

    Exec Plan:
    select * from vJoin –TOOK:47%
    select * from SingleUDF() –TOOK:47%
    select * from MultiUDF() –TOOK:6%

    Then I repeated the above with a a where clause on each of the “select” command

    No PK on MultiUDF (second query run):
    set statistics time on
    select * from vJoin where ParentId = 1 and childID between 25 and 55 –TOOK:CPU time = 0 ms, elapsed time = 32 ms. execPlan = 40%

    select * from SingleUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 46 ms. execPlan = 40%

    select * from MultiUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 1 ms. execPlan = 19%

    set statistics time off

    With PK on MultiUDF (second query run):
    set statistics time on
    select * from vJoin where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 29 ms. execPlan = 40%

    select * from SingleUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 47 ms. execPlan = 40%

    select * from MultiUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 1 ms. execPlan = 19%

    set statistics time off

    So PK or no PK doesn’t matter, but that may depend on the where clause

    But MultiUDF does work much faster than the other 2 when the where clause is a range

    Then I thought what about passing paramters, so I created:
    Create Function dbo.MultiUDFWithParams (@ParentId_1 int, @ParentId_2 int, @ParentDate_1 datetime,@ParentDate_2 datetime,@ChildId_1 int,@ChildId_2 int, @ChildDate_1 datetime ,@ChildDate_2 datetime)
    RETURNS @Multi TABLE
    (ParentId int
    ,ParentDate datetime
    ,ChildId int
    ,ChildDate datetime)
    AS
    BEGIN
    INSERT INTO @Multi
    (ParentId
    ,ParentDate
    ,ChildId
    ,ChildDate)
    SELECT p.ParentId
    ,p.ParentDate
    ,c.ChildId
    ,C.ChildDate
    FROM dbo.Parent p
    JOIN dbo.Child c
    ON p.ParentId = c.ParentId
    where
    p.ParentId between isnull(@ParentId_1,p.ParentId) and isnull(@ParentId_2,p.ParentId)
    AND
    p.ParentDate between isnull(@ParentDate_2,p.ParentDate) and isnull(@ParentDate_2,p.ParentDate)
    AND
    c.ChildId between isnull(@ChildId_1,c.ChildId) and isnull(@ChildId_2,c.ChildId)
    AND
    C.ChildDate between isnull(@ChildDate_1,C.ChildDate) and isnull(@ChildDate_2,C.ChildDate)
    RETURN
    END

    Then ran the queries again:

    set statistics time on
    select * from vJoin where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 27 ms. execPlan = 40%
    select * from SingleUDF() where ParentId = 1 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 49 ms. execPlan = 40%
    select * from MultiUDFWithParams(1, NULL, NULL,NULL, 45,55,NULL, NULL) –TOOK:CPU time = 0 ms, elapsed time = 1 ms. execPlan = 19%
    set statistics time off

    So no real difference.

    There might be a difference if more data is added

  • GregJF

    So I add 100K of rows
    non params:
    set statistics time on
    select * from vJoin where ParentId between 45 and 55 and childID between 45 and 55–TOOK:CPU time = 0 ms, elapsed time = 109 ms. execPlan = 50%
    select * from SingleUDF() where ParentId between 45 and 55 and childID between 45 and 55–TOOK:CPU time = 0 ms, elapsed time = 83 ms. execPlan = 50%
    select * from MultiUDF() where ParentId between 45 and 55 and childID between 45 and 55–TOOK:CPU time = 0 ms, elapsed time = 256 ms. execPlan = 0%
    set statistics time off

    Maybe there is something in that “incorrect execution plan ” comment from Grant

    then the param UDF

    set statistics time on
    select * from vJoin where ParentId between 45 and 55 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 89 ms. execPlan = 40%
    select * from SingleUDF() where ParentId between 45 and 55 and childID between 45 and 55 –TOOK:CPU time = 0 ms, elapsed time = 87 ms. execPlan = 40%
    select * from MultiUDFWithParams(45, 55, NULL,NULL, 45,55,NULL, NULL) –TOOK:CPU time = 0 ms, elapsed time = 49 ms. execPlan = 2%
    set statistics time off

    param query is much better when there is lots of data

    SingleUDF is most consistent overall

  • The original post was an extremely simplistic example, that, by and large has still held up to what I said originally. I can build a more substantive test at some point when I’ve got some time. I’ve seen consistently poor performance from multi-statement UDFs, as have the majority of query tuning consultants I know.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.