Statistics and Natively Compiled Procedures

02Jul 2014 by Grant Fritchey 5 Comments

Statistics are one of the single most important driving factors for the behavior of the query optimizer. The cardinality estimates stored within the statistics drive costing and costing drives the decision making of the optimizer. So, how does this work with the new SQL Server 2014 natively compiled procedures? Differently.

In-memory tables do not maintain their statistics automatically. Further, you can’t run DBCC SHOW_STATISTICS to get information about those statistics, so you can’t tell if they’re out of date or not or what the distribution of the data is within them. So, if I create some memory optimized tables, skip loading any data into themÂ and then run this standardÂ query:

SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = 'London';

The estimates in the execution plan for the query show some pretty wild values:

That’s an estimated number of rows of 65,536 and an actual of zero in a table scan because I created my table without an index. If I recreate it with an index, but still no data, the estimates change and we have an index seek operation now:

That’s suggesting that the optimizer thinks there are 256 rows. But there’s no data here. So, let’s load some data into the table. Here are the new estimates from the index seek operator:

I haven’t yet updated the statistics, so the optimizer still thinks there are zero rows in the table, or at least, it has no statistics. Well, not true:

SELECT  s.name,
        s.auto_created,
        s.user_created,
        s.filter_definition,
        sc.column_id,
        c.name AS ColumnName
FROM    sys.stats AS s
        JOIN sys.stats_columns AS sc ON sc.stats_id = s.stats_id
AND sc.object_id = s.object_id
        JOIN sys.columns AS c ON c.column_id = sc.column_id
AND c.object_id = s.object_id
WHERE   s.object_id = OBJECT_ID('dbo.Address');

If I run this, I’ll see statistics on the table, including system generated statistics:

One point, while statistics don’t update automatically, they clearly can still be created automatically. But I can’t run DBCC SHOW_STATISTICS to see what’s in there. So, let’s see what estimates look like in the natively compiled procedure. I’ll take the same query code above and convert it to a proc. Then, when I capture the estimated plan from the procedure (no actual plans allowed), the index seek operator shows these estimates:

Now, we have an accurate estimate on the number of rows this query is likely to return. Interesting. So, let’s drop the procedure, update the statistics and then rerun the query and the procedure. First the query. The estimates don’t change. I’m still seeing estimated values as 256 while the actual is 434. So, let’s free the procedure cache and try again:

Ah, there we go. The plan itself came out the same way, but we clearly have more accurate estimates now. On to the procedure. I’ll recreate it and then get the estimated plan. Here are the estimate values from the same index seek operation:

Oops. Still estimated 0 rows.

What’s all this mean? I’m not sure. The documentation from Microsoft in this area is sketchy. During the most recentÂ 24 Hours of PASS, I was able to ask MicrosoftÂ about the impact of statistics on natively compiled plans. They suggested that it was not necessarily going to be the same as we see in standard queries. These tests make that fairly evident. Also, it looks like the default values of estimated rows for in-memory tables is different. If I create standard tables, empty, and run the same query against them, the estimated number of rows is what I expect, 1. But in the case of in-memory tablesÂ it’s 256 with an index and 65,536 without one (or at least that’s what I’m seeing). However, the estimates for the natively compiled procedure never changed in this test case, always at 0. This is hardly shocking, but it seems that different rules apply for in-memory tables and their statistics as well as natively compiled procedures and their consumption of those statistics. And, as Microsoft has changed the default estimated number of rows for table variables from 1 to 100 in SQL Server 2014, it seems we have another instance where they’re defaulting to an even higher value and one where the values seem to just disappear.

The behavior of statistics within in-memory tablesÂ is extremely interesting because you may see changing behavior with those tablesÂ as your queries get more complex and your data changes. It makes a very strong case for making sure that you update your statistics on a regular basis on these tables.

I’m taking this show on the road. If you want toÂ get an all day class on query tuning,Â I’ve got a lot of opportunities coming up.Â I believe that Albany, on July 25th, is not yet sold out.Â You can register here.Â I’mÂ teaching an all day session at SQL ConnectionsÂ in September in Las Vegas.Â Go here to register for this great event. In BelgiumÂ in October, Iâ€™ll be doing an all day session onÂ execution plansÂ atÂ SQL Server Days.Â Go here to register for this event.Â I’m excited, and more than a little humbled, to get the opportunity to present an all day pre-conference seminar at the PASS Summit in Seattle in November. Go here to register.

5 thoughts on “Statistics and Natively Compiled Procedures”

GP Van Eron

Excellent article. Thanks for posting.

July 3, 2014 at 3:24 pm
(SFTW) SQL Server Links 03/07/14 - John Sansom

[…] Statistics and Natively Compiled ProceduresÂ -Â Grant Fritchey (Blog|Twitter) […]

July 5, 2014 at 8:44 am
M McDonald

I don’t quite follow:

Roughly a third of the way down you have:

Thatâ€™s suggesting that the optimizer thinks there are 256 rows. But thereâ€™s no data here. So, letâ€™s load some data into the table. Here are the new estimates from the index seek operator:

I havenâ€™t yet updated the statistics, so the optimizer still thinks there are zero rows in the table, or at least, it has no statistics. Well, not true:
****the plan still shows an estimate of 256 rows but actual rows as 434****

Continuing further down:

So, letâ€™s see what estimates look like in the natively compiled procedure. Iâ€™ll take the same query code above and convert it to a proc. Then, when I capture the estimated plan from the procedure (no actual plans allowed), the index seek operator shows these estimates:

Now, we have an accurate estimate on the number of rows this query is likely to return. Interesting.

****Image shows estimate of 0 rows. How can that be accurate when you previously loaded the tables but did not delete them? Images incorrect? Further down you mention estimate 256/acutall 434 but the image shows 434 for both…****

I am missing something

Thank you.

February 27, 2015 at 1:18 pm
Grant Fritchey

Nope. You’re not missing anything. The estimates and actuals returned from the query plans are very inconsistent. It’s acting strange and that’s the only answer I have at the moment.

February 27, 2015 at 2:26 pm
M McDonald

Awesome 😉

Thank you.

February 27, 2015 at 2:45 pm

Please let me know what you think about this article or any questions:Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Statistics and Natively Compiled Procedures

Share this:

Related

5 thoughts on “Statistics and Natively Compiled Procedures”

Please let me know what you think about this article or any questions:Cancel reply