Common Table Expressions Are Not Tables

There’s power in naming things. Supposedly some types of magic are even based on knowing the correct names for things. The name for the T-SQL clause Common Table Expression (CTE) is actually pretty accurate. It’s an expression that looks like a table and can be used in common across the entire query (at least I think that’s what the common part refers to). But note, I didn’t say it was a table. It’s not. It’s an expression. If you look at the T-SQL definition at the link, it refers to a “temporary” result set. Now, to a lot of people, that means table. But it isn’t. Let’s look at this in more detail.

Here’s a query that defines a simple CTE and then uses it to query the date in the next T-SQL statement:

WITH    x AS (SELECT    soh.OrderDate,
                        soh.SalesOrderID,
                        sod.LineTotal,
						sod.ProductID
              FROM      Sales.SalesOrderHeader AS soh
              JOIN      Sales.SalesOrderDetail AS sod
                        ON sod.SalesOrderID = soh.SalesOrderID
              WHERE     soh.SalesOrderID BETWEEN 43683 AND 43883
             )
    SELECT  x.OrderDate,
            x.LineTotal,
            p.Name
    FROM    x
    JOIN    Production.Product AS p
            ON p.ProductID = x.ProductID;

If we run this, we get the following execution plan:

CommonExecPlan

Let’s not worry about the plan for the moment. Instead, I want to look at a couple of more queries:

--Derived Table
SELECT  x.OrderDate,
        x.LineTotal,
        p.Name
FROM    (SELECT soh.OrderDate,
                soh.SalesOrderID,
                sod.LineTotal,
				sod.ProductID
         FROM   Sales.SalesOrderHeader AS soh
         JOIN   Sales.SalesOrderDetail AS sod
                ON sod.SalesOrderID = soh.SalesOrderID
         WHERE  soh.SalesOrderID BETWEEN 43683 AND 43883
        ) AS x
JOIN    Production.Product AS p
        ON p.ProductID = x.ProductID;

--Just a JOIN
SELECT  soh.OrderDate,
        sod.LineTotal,
        p.Name
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID
JOIN    Production.Product AS p
        ON p.ProductID = sod.ProductID
WHERE   soh.SalesOrderID BETWEEN 43683 AND 43883;

Three, nominally, different queries. Yet, if you run these queries, all of them return exactly the same data and all of them have exactly the same execution plan. That’s right, the CTE didn’t change the way that SQL Server derived these queries in any way. Nor would it. That’s because, the CTE is absolutely not a table. It’s an expression.

That name appears to hold magic. We see the word table and we think “Tables, I know all about them” and we proceed to start treating our CTEs as if they were tables, but they’re not. And Microsoft’s definition is completely accurate, if maybe just a touch misleading. These are not tables we’re dealing with when we work with Common Table Expressions.

Let’s look at the difference:

--Table Valued Parameter
DECLARE @TVP AS TABLE
(OrderDate DATETIME,
SalesOrderID INT,
LineTotal NUMERIC(38,6),
ProductID INT,
INDEX tt_ci CLUSTERED (SalesOrderID),
INDEX tt_i NONCLUSTERED(ProductID));

INSERT @TVP
        (OrderDate,
         SalesOrderID,
         LineTotal,
         ProductID
        )
SELECT soh.OrderDate,
                soh.SalesOrderID,
                sod.LineTotal,
				sod.ProductID
         FROM   Sales.SalesOrderHeader AS soh
         JOIN   Sales.SalesOrderDetail AS sod
                ON sod.SalesOrderID = soh.SalesOrderID
         WHERE  soh.SalesOrderID BETWEEN 43683 AND 43883;


SELECT  x.OrderDate,
        x.LineTotal,
        p.Name
FROM    @TVP AS x
JOIN    Production.Product AS p
        ON p.ProductID = x.ProductID;



--Temporary Table
CREATE TABLE #TT
(OrderDate DATETIME,
SalesOrderID INT,
LineTotal NUMERIC(38,6),
ProductID INT,
INDEX tt_ci CLUSTERED (SalesOrderID),
INDEX tt_i NONCLUSTERED (ProductID));

INSERT #TT
        (OrderDate,
         SalesOrderID,
         LineTotal,
         ProductID
        )
SELECT  soh.OrderDate,
                soh.SalesOrderID,
                sod.LineTotal,
				sod.ProductID
         FROM   Sales.SalesOrderHeader AS soh
         JOIN   Sales.SalesOrderDetail AS sod
                ON sod.SalesOrderID = soh.SalesOrderID
         WHERE  soh.SalesOrderID BETWEEN 43683 AND 43883;


SELECT  x.OrderDate,
        x.LineTotal,
        p.Name
FROM    #TT AS x
JOIN    Production.Product AS p
        ON p.ProductID = x.ProductID;

These two queries are using table objects, a table valued parameter and a temporary table. The execution plans are decidedly different from the one above and each other:

TableExecPlans

Interesting enough it chose to do a Merge instead of the Nested Loops and ordered the table scan to accomplish it. Yet, that query runs faster and has less reads than all the others… As long as you don’t count the time to load the temporary table. Then it just doesn’t work as well as the others, at all. Speaking of execution times, the first three queries all had exactly 27 reads and, within some variations, ran in about 30ms on my system. Anyway, query tuning is not the point of the discussion.

You can see the differences in the execution plans that you get between an expression, a CTE, and actual temporary storage, either table variables or temporary tables. Yet, you can also see that the CTE, an expression, just an expression, was resolved in the exact same way as the derived table and the simple JOIN. In short, a common table expression is not a table. Absolutely not. It’s an expression.

On a side-note, if you’re thinking of attending SQL Rally Nordic 2015 in Copenhagen and you want to spend some time talking about query tuning, I’m doing a full day pre-conference seminar. Click here now to register.

13 thoughts on “Common Table Expressions Are Not Tables

  • Erik

    You should note that if you join a CTE to itself, it will execute the expression again. I learned that recently and… regretted some past choices.

    The execution plan for this will show two scans.

    WITH c1
    AS ( SELECT [ProductID] ,
    [Name] ,
    [ProductNumber]
    FROM [Production].[Product]
    )
    SELECT ca.ProductID ,
    ca.Name ,
    ca.ProductNumber
    FROM c1 ca
    JOIN c1 cb ON cb.ProductID = ca.ProductID

  • Oh yeah, absolutely. I wasn’t getting into what happens with recursion and multiple references. But, it’s still not a table. It’s just two queries instead of just one. Fine if that’s what you need. Horrible if you thought you were joining two tables together.

  • Erik

    Exactly, and I think that’s a good demonstration of why they’re not tables. You don’t have to populate a temp table twice to join to it, nor can you say that three times fast with any measure of comfort.

  • Joe Celko

    It is worth a mention that in other SQL products, a CTE might be materialized if the optimizer sees that it is used more than once or would have some other advantage.

  • Koen Verbeeck

    > A view is just a query so…

    Yeah, but the metadata of the view is explicitly stored in the database (sorry sorry, I know being pedantic is frowned upon 😉

    What I like (or find curious) is that you can “update” a CTE. Something like:

    WITH CTE_BaseTable AS
    (SELECT ProductName, Color FROM dbo.Products)
    UPDATE CTE_BaseTable
    SET Color = ‘Red’
    WHERE ProductName = ‘Gate’;

  • Curt Coker

    The timeliness of this post on sqlservercentral.com today is amazing. Yesterday I was thinking that there should be an enhancement to ANSI SQL language that allows the SQL developer to specify when to materialize/cache a CTE. It would look something like:
    WITH cte_name (col_list) AS CACHED (SELECT col_list FROM table_name)
    etc.
    This would basically tell the plan generator, “do this SELECT first, put it in a temp table, use the result set in the rest of the query, and then destroy the temp table.” No muss, no fuss. Only caveat is that I am giving up the ability to declare a PK or create an index, but maybe that is another enhancement.

  • Sam

    @Curt

    Surely you just mean temporary tables but with MORE typing and no possibility of using the same cached data in more than one query?
    The only benefit I can see is that it would be all wrapped up in a single query.

    SELECT col_list INTO #cte_name FROM table_name
    (Add index, PK etc…)
    etc…
    etc…
    etc…
    DROP TABLE #cte_name

  • Curt Coker

    @Sam

    If you use a CTE then by definition you don’t want re-use. Part of the appeal of a CTE is limited scope and lack of persistence (e.g. no DROP TABLE to worry about). When it is more appropriate to use a temp table then use a temp table.

    • Jeff Moden

      @Curt… you don’t need to use DROP TABLE on temp tables in a stored procedure or even in a session. The temp tables automatically drop when the session ends or the “lesser scope” ends. The meta data is automatically preserved a bit in case the stored procedure is soon used again to improve performance, where a CTE will not, even if you trick it into “pre-materializing” with a “blocking operator”, which is another performance trick that can be used in some cases.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.