Statistics are one of the single most important driving factors for the behavior of the query optimizer. The cardinality estimates stored within the statistics drive costing and costing drives the decision making of the optimizer. So, how does this work with the new SQL Server 2014 natively compiled procedures? Differently.
In-memory tables do not maintain their statistics automatically. Further, you can’t run DBCC SHOW_STATISTICS to get information about those statistics, so you can’t tell if they’re out of date or not or what the distribution of the data is within them. So, if I create some memory optimized tables, skip loading any data into them and then run this standard query:
SELECT a.AddressLine1, a.City, a.PostalCode, sp.Name AS StateProvinceName, cr.Name AS CountryName FROM dbo.Address AS a JOIN dbo.StateProvince AS sp ON sp.StateProvinceID = a.StateProvinceID JOIN dbo.CountryRegion AS cr ON cr.CountryRegionCode = sp.CountryRegionCode WHERE a.City = 'London';
The estimates in the execution plan for the query show some pretty wild values:
That’s an estimated number of rows of 65,536 and an actual of zero in a table scan because I created my table without an index. If I recreate it with an index, but still no data, the estimates change and we have an index seek operation now:
That’s suggesting that the optimizer thinks there are 256 rows. But there’s no data here. So, let’s load some data into the table. Here are the new estimates from the index seek operator:
I haven’t yet updated the statistics, so the optimizer still thinks there are zero rows in the table, or at least, it has no statistics. Well, not true:
SELECT s.name, s.auto_created, s.user_created, s.filter_definition, sc.column_id, c.name AS ColumnName FROM sys.stats AS s JOIN sys.stats_columns AS sc ON sc.stats_id = s.stats_id AND sc.object_id = s.object_id JOIN sys.columns AS c ON c.column_id = sc.column_id AND c.object_id = s.object_id WHERE s.object_id = OBJECT_ID('dbo.Address');
If I run this, I’ll see statistics on the table, including system generated statistics:
One point, while statistics don’t update automatically, they clearly can still be created automatically. But I can’t run DBCC SHOW_STATISTICS to see what’s in there. So, let’s see what estimates look like in the natively compiled procedure. I’ll take the same query code above and convert it to a proc. Then, when I capture the estimated plan from the procedure (no actual plans allowed), the index seek operator shows these estimates:
Now, we have an accurate estimate on the number of rows this query is likely to return. Interesting. So, let’s drop the procedure, update the statistics and then rerun the query and the procedure. First the query. The estimates don’t change. I’m still seeing estimated values as 256 while the actual is 434. So, let’s free the procedure cache and try again:
Ah, there we go. The plan itself came out the same way, but we clearly have more accurate estimates now. On to the procedure. I’ll recreate it and then get the estimated plan. Here are the estimate values from the same index seek operation:
Oops. Still estimated 0 rows.
What’s all this mean? I’m not sure. The documentation from Microsoft in this area is sketchy. During the most recent 24 Hours of PASS, I was able to ask Microsoft about the impact of statistics on natively compiled plans. They suggested that it was not necessarily going to be the same as we see in standard queries. These tests make that fairly evident. Also, it looks like the default values of estimated rows for in-memory tables is different. If I create standard tables, empty, and run the same query against them, the estimated number of rows is what I expect, 1. But in the case of in-memory tables it’s 256 with an index and 65,536 without one (or at least that’s what I’m seeing). However, the estimates for the natively compiled procedure never changed in this test case, always at 0. This is hardly shocking, but it seems that different rules apply for in-memory tables and their statistics as well as natively compiled procedures and their consumption of those statistics. And, as Microsoft has changed the default estimated number of rows for table variables from 1 to 100 in SQL Server 2014, it seems we have another instance where they’re defaulting to an even higher value and one where the values seem to just disappear.
The behavior of statistics within in-memory tables is extremely interesting because you may see changing behavior with those tables as your queries get more complex and your data changes. It makes a very strong case for making sure that you update your statistics on a regular basis on these tables.
I’m taking this show on the road. If you want to get an all day class on query tuning, I’ve got a lot of opportunities coming up. I believe that Albany, on July 25th, is not yet sold out. You can register here. I’m teaching an all day session at SQL Connections in September in Las Vegas. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. I’m excited, and more than a little humbled, to get the opportunity to present an all day pre-conference seminar at the PASS Summit in Seattle in November. Go here to register.