Statistics in Execution Plans

Home / SQL Server 2005 / Statistics in Execution Plans

I was presenting on execution plans when another question came up that I didn’t know the answer to immediately. Yes, I know you’ve seen that phrase before on this blog. I love presenting because you get exactly the kinds of questions that make you think and make you learn. I’m presenting, in part, to learn, just as much as I am to teach. It was the same with kenpo. The more I taught, the better I learned the art. Wait, this isn’t supposed to be a blog post about learning. This one is about statistics.

The question was, does the execution plan have the statistics that were used by the optimizer to decide on the execution plan. And no, what was meant, was not does it show the estimated rows, which come from the statistics, but specifically does it show that it used a set of statistics named X? The answer to that is yes & no. Or, to be more DBAish about it, it depends.

Let’s take a really simple query run against a freshly installed copy of AdventureWorks2008R2:

[sourcecode language=”sql”]SELECT p.BusinessEntityID,
p.FirstName
FROM Person.Person AS p
WHERE p.FirstName LIKE ‘Toni%’;[/sourcecode]

This query generates this execution plan:

Stats

Yes, a very sophisticated and hard to understand execution plan. Now, here’s the deal, there were two sets of statistics used to make this plan, but only one of them can be seen in the plan. See any statistics there? Sure you do. IX_Person_LastName_FirstName_MiddleName. That is one of the two sets of statistics that were used to make this execution plan. Where’s the second set? Not in the execution plan.

[sourcecode language=”sql”]sp_helpstats N’Person.Person’, ‘ALL’;[/sourcecode]

The results are here:

statslist

There is the second set of statistics used for this query, right at the top. Because I was searching the FirstName column, the optimizer found that it did not have the statistics it needed, so they were created, on the fly, and then, were not a part of the execution plan. Further, I probably looked at other statistics such as the PK_Person_BusinessEntityID because that’s the clustered index for the table. It could have scanned that to get the list of values just as easily as the other index. But, that other index is probably smaller, which means fewer pages scanned.

So, back to the question, can you see the statistics used by the optimizer inside the execution plan? Some of them, yes, but not all of them.

Please, if I’m presenting, ask questions. I’ll know a few of the answers, right off the top of my head. Others will make me go and learn so that I can answer the next person who asks the same question. I don’t mind losing at a game of Stump the Chump, so let’s play.

15 Comments

  • It’s irritating, because they must be stored somewhere in the internal representation of the plan so that the pre-execution validation checks know what stats need to be checked for updates.

  • Cool! I learned something new. I didn’t know about sp_helpstats. I looked it up in BOL, and it says, “This feature will be removed in the next version of Microsoft SQL Server…To obtain information about statistics, query the sys.stats and sys.stats_columns catalog views.” Do you know how you would find the info using those?

  • Ya, an optimizer decision tree log to show which stats were used and decision results would be awesome. I smell a trace flag! I’d love to see a history of ‘found CI PK_My_Index, decided not to use it because of Reason Code x: Covering NCI idx_FirstName found.” “Analyzed NCI idx_FirstName, need FirstName, all columns found.” “Used full Scan due to use of LIKE.”. Or whatever. I know the steps and decisions are far more exhaustive for what the optimizer decides and why, but it would be cool to peer inside.

  • Here’s one alternative to sp_helpstats

    SELECT OBJECT_NAME(object_id), name, auto_created, user_created, no_recompute, has_filter, filter_definition, LEFT(list,LEN(list)-1)
    FROM sys.stats AS s
    CROSS APPLY (
    SELECT
    name + ‘, ‘ AS [text()]
    FROM
    sys.stats_columns AS sc INNER JOIN sys.columns AS c ON sc.object_id = c.object_id AND sc.column_id = c.column_id
    WHERE
    s.object_id = sc.object_id AND s.stats_id = sc.stats_id
    ORDER BY
    stats_column_id
    FOR XML PATH(”)
    ) columnlist (list)

  • James Lean

    @Jes I knocked this up quickly, think it looks ok:

    SELECT s.name AS statistics_name,
    STUFF
    (
    (
    SELECT ‘, ‘ + c.name AS [text()]
    FROM sys.stats_columns sc
    JOIN sys.columns c ON sc.object_id = c.object_id
    AND sc.column_id = c.column_id
    WHERE sc.object_id = s.object_id
    AND sc.stats_id = s.stats_id
    ORDER BY sc.stats_column_id
    FOR XML PATH (”)
    ),
    1,
    2,

    ) AS statistics_keys
    FROM sys.stats s
    WHERE s.object_id = OBJECT_ID(‘Person.Person’)
    ORDER BY s.name

  • The trace flags you are after are 9204 (statistics loaded) and/or 9292 (statistics header loaded). You also need 3604 to redirect the output to the Messages window.

    Text output only appears when a compilation or recompilation occurs – so clear the plan from cache first, dump the plan cache completely, or use OPTION (RECOMPILE) – though that may change the plan of course.

    You can also use another undocumented option to enable the trace flags just for the statement, e.g. for stats headers:

    SELECT
    p.BusinessEntityID,
    p.FirstName
    FROM Person.Person AS p
    WHERE
    p.FirstName LIKE ‘Toni%’
    OPTION
    (
    QUERYTRACEON 3604,
    QUERYTRACEON 9292,
    RECOMPILE
    );

    Cheers,

    Paul

  • Thanks Paul!

    I figured that there must be some undocumented trace flags for this since the SQL Engine/optimizer QA department needs something to ensure the engine behaves properly and decides accurately with each release/Service Pack. You certainly can’t leave that to the promises of the developers, can you? P’scha. I know they are not using trial and error.

    Outstanding find here. Nice work and thanks for reporting. I look forward to playing with this in my lab.

OK, fine, but what do you think?