Query Your Statistics: dm_db_stats_properties

SQL Server
We've always been able to look at statistics with DBCC SHOW_STATISTICS. You can even tell SHOW_STATISTICS to only give you the properties, STAT_HEADER, or histogram, HISTOGRAM. However, it's always come back in a format that you can't easily consume in T-SQL. From SQL Server 2012 to everything else, you can simply query sys.dm_db_stats_properties to get that same header information, but in a consumable fashion. dm_db_stats_properties You can pretty easily query the function dm_db_stats_properties: SELECT ddsp.object_id, ddsp.stats_id, ddsp.last_updated, ddsp.rows, ddsp.rows_sampled, ddsp.steps, ddsp.unfiltered_rows, ddsp.modification_counter, ddsp.persisted_sample_percent FROM sys.dm_db_stats_properties(OBJECT_ID('Production.Location'), 1) AS ddsp; You just have to pass in the object_id value and the stats_id value and you're off to the races. I can hear you. Why? So we can do things like this. What if I want to look at statistics that have fewer…
Read More

Query To Retrieve Statistics Data: dm_db_stats_histogram

SQL Server 2017, T-SQL
Starting with SQL Server 2016 Sp1 CU2, a new way of directly querying statistics, specifically the histogram, has been introduced: dm_db_stats_histogram. We've always been able to query the statistics using DBCC SHOW_STATISTICS. However, the output, three different result sets with three different structures, made automating access to statistics information a pain. Now, we have more capability through dm_db_stats_histogram. dm_db_stats_histogram To access the information in dm_db_stats_histogram, you just have to pass in the object_id and the statistics_id values for the statistics you're interested in like this: SELECT * FROM sys.dm_db_stats_histogram(OBJECT_ID('HumanResources.Employee'), 1) AS ddsh; It's very straight forward to use. The results look like this: Handy right? Now you can query the histogram directly. Yeah, I hear a few of you saying... and this helps me... how? Here's an example. This query…
Read More

Determining the Cost Threshold for Parallelism

SQL Server, SQL Server 2016
In the previous post, I showed how you can get full listings of your execution plan costs. Knowing what the values you're dealing with for the estimated costs on your execution plans can help you determine what the Cost Threshold on your system should be. However, we don't want to just take the average and use that. You need to understand the data you're looking at. Let's explore this just a little using R. Mean, Median, Range and Standard Deviation I've used the queries in the previous blog post to generate a full listing of costs for my plans. With that, I can start to query the information. Here's how I could use R to begin to explore the data: library("RODBC", lib.loc="~/R/win-library/3.2") query <- "SELECT * FROM dbo.QueryCost;" dbhandle <-…
Read More

Scheduling Statistics Maintenance in Azure SQL Data Warehouse

Azure
The power and capabilities of Azure SQL Data Warehouse are pretty amazing. However, it's still basically SQL Server under the covers. SQL Server still needs maintenance and one type of maintenance is keeping statistics up to date. Statistics maintenance is doubly important within Azure SQL Data Warehouse because statistics are not created automatically, nor are they maintained automatically. It's up to you to ensure your statistics are up to date. Statistics in Azure SQL Data Warehouse It's worth re-emphasizing the point that there is no automatic creation of statistics in Azure SQL Data Warehouse. This means you need to plan to add statistics to your tables. Microsoft recommends an easy method is to add them to every column. While that does make things easy, it's not like statistics are free. If you…
Read More

Reinforcing the Importance of Statistics on Row Estimate

SQL Server, SQL Server 2016, T-SQL
I recently wrote an introductory post about the importance of statistics. I just received a reinforcement on how important they are during my own work. Bad Estimate I hit a weird problem while I was setting up a query to illustrate a point (blog to be published next week). Let's take the basis of the problem and explain it. I wanted data with distribution skew, so I ran this query to find out if there was a wide disparity between the top and bottom of the range: SELECT i.BillToCustomerID, COUNT(i.BillToCustomerID) AS TestCount FROM Sales.Invoices AS i GROUP BY i.BillToCustomerID ORDER BY TestCount ASC; Sure enough, the bottom of the range returned three (3) rows and the top returned 21,551. If I then run a query to retrieve just a few rows…
Read More

Statistics Are Vital For Query Performance

SQL Server, SQL Server 2016
This is post #10 supporting  Tim Ford’s (b|t) initiative on #iwanttohelp, #entrylevel. Read about it here. When you send a query to your SQL Server database (and this applies to Azure SQL Database, APS, and Azure SQL Data Warehouse), that query is going to go through a process known as query optimization. The query optimization process figures out if you can use indexes to assist the query, whether or not it can seek against those indexes or has to use a scan, and a whole bunch of other stuff. The primary driving force in making these decisions are the statistics available on the indexes and on your tables. What Are Statistics Statistics are a mathematical construct to represent the data in your tables. Instead of scanning through the data each and every…
Read More

Statistics for the New Data Pro

PASS, SQL Server, SQL Server 2016, T-SQL
Next week at the PASS Summit I'll be presenting a session called Statistics for the New Data Pro. You can read the abstract at the link. I just want to emphasize that this is a beginner level session. I think way too many people who are just starting out with SQL Server don't understand the role that statistics play in determining how your queries are going to behave. What's more, too many people don't know how to get and read statistics to understand how it is that the optimizer thinks you have X number of rows in your database that match a given value. I'm going to make darned sure that the people who attend this session come out with a full understanding of how to read the statistics. This includes…
Read More

Statistics and Natively Compiled Procedures

T-SQL
Statistics are one of the single most important driving factors for the behavior of the query optimizer. The cardinality estimates stored within the statistics drive costing and costing drives the decision making of the optimizer. So, how does this work with the new SQL Server 2014 natively compiled procedures? Differently. In-memory tables do not maintain their statistics automatically. Further, you can't run DBCC SHOW_STATISTICS to get information about those statistics, so you can't tell if they're out of date or not or what the distribution of the data is within them. So, if I create some memory optimized tables, skip loading any data into them and then run this standard query: SELECT a.AddressLine1, a.City, a.PostalCode, sp.Name AS StateProvinceName, cr.Name AS CountryName FROM dbo.Address AS a JOIN dbo.StateProvince AS sp ON sp.StateProvinceID =…
Read More

sp_updatestats Is Not Smart

SQL Server, T-SQL
No, I don't mean the use of sp_updatestats is not smart. It's a fine, quick mechanism for getting statistics updated in your system. But the procedure itself is not smart. I keep seeing stuff like "sp_updatestats knows which statistics need to be updated" and similar statements. Nope. Not true. Wanna know how I know? It's tricky. Ready? I looked at the query. It's there, in full, at the bottom of the article (2014 CTP2 version, just in case yours is slightly different, like, for example, no Hekaton logic). Let's focus on just this bit: if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0))) The most interesting part is right at the front, @ind_rowmodctr <> 0. That value is loaded with the cursor and comes from sys.sysindexes and the rowmodctr column…
Read More

The CASE Statement and Performance

SQL Server, T-SQL
In case you don't know, this query: UPDATE dbo.Test1 SET C2 = 2 WHERE C1 LIKE '%33%'; Will run quite a bit slower than this query: UPDATE dbo.Test1 SET C2 = 1 WHERE C1 LIKE '333%'; Or this one: UPDATE dbo.Test1 SET C2 = 1 WHERE C1 = '333'; That's because the second two queries have arguments in the filter criteria that allow SQL Server to use the statistics in an index to look for specific matching values and then use the balanced tree, B-Tree, of the index to retrieve specific rows. The argument in the first query requires a full scan against the index because there is no way to know what values might match or any path through the index to simply retrieve them. But, what if we…
Read More