statistics Archives - Grant Fritchey

07Feb 2022 by Grant Fritchey 1 Comment

Query Your Statistics: dm_db_stats_properties

We've always been able to look at statistics with DBCC SHOW_STATISTICS. You can even tell SHOW_STATISTICS to only give you the properties, STAT_HEADER, or histogram, HISTOGRAM. However, it's always come back in a format that you can't easily consume in T-SQL. From SQL Server 2012 to everything else, you can simply query sys.dm_db_stats_properties to get that same header information, but in a consumable fashion. dm_db_stats_properties You can pretty easily query the function dm_db_stats_properties: SELECT ddsp.object_id, ddsp.stats_id, ddsp.last_updated, ddsp.rows, ddsp.rows_sampled, ddsp.steps, ddsp.unfiltered_rows, ddsp.modification_counter, ddsp.persisted_sample_percent FROM sys.dm_db_stats_properties(OBJECT_ID('Production.Location'), 1) AS ddsp; You just have to pass in the object_id value and the stats_id value and you're off to the races. I can hear you. Why? So we can do things like this. What if I want to look at statistics that have fewer…

29May 2018 by Grant Fritchey 2 Comments

Query To Retrieve Statistics Data: dm_db_stats_histogram

SQL Server 2017, T-SQL

Starting with SQL Server 2016 Sp1 CU2, a new way of directly querying statistics, specifically the histogram, has been introduced: dm_db_stats_histogram. We've always been able to query the statistics using DBCC SHOW_STATISTICS. However, the output, three different result sets with three different structures, made automating access to statistics information a pain. Now, we have more capability through dm_db_stats_histogram. dm_db_stats_histogram To access the information in dm_db_stats_histogram, you just have to pass in the object_id and the statistics_id values for the statistics you're interested in like this: SELECT * FROM sys.dm_db_stats_histogram(OBJECT_ID('HumanResources.Employee'), 1) AS ddsh; It's very straight forward to use. The results look like this: Handy right? Now you can query the histogram directly. Yeah, I hear a few of you saying... and this helps me... how? Here's an example. This query…

28Feb 2017 by Grant Fritchey 19 Comments

Determining the Cost Threshold for Parallelism

SQL Server, SQL Server 2016

In the previous post, I showed how you can get full listings of your execution plan costs. Knowing what the values you're dealing with for the estimated costs on your execution plans can help you determine what the Cost Threshold on your system should be. However, we don't want to just take the average and use that. You need to understand the data you're looking at. Let's explore this just a little using R. Mean, Median, Range and Standard Deviation I've used the queries in the previous blog post to generate a full listing of costs for my plans. With that, I can start to query the information. Here's how I could use R to begin to explore the data: library("RODBC", lib.loc="~/R/win-library/3.2") query <- "SELECT * FROM dbo.QueryCost;" dbhandle <-…

18Jan 2017 by Grant Fritchey 7 Comments

Scheduling Statistics Maintenance in Azure SQL Data Warehouse

Azure

The power and capabilities of Azure SQL Data Warehouse are pretty amazing. However, it's still basically SQL Server under the covers. SQL Server still needs maintenance and one type of maintenance is keeping statistics up to date. Statistics maintenance is doubly important within Azure SQL Data Warehouse because statistics are not created automatically, nor are they maintained automatically. It's up to you to ensure your statistics are up to date. Statistics in Azure SQL Data Warehouse It's worthÂ re-emphasizing the point that there is no automatic creation of statistics in Azure SQL Data Warehouse. This means you need to plan to add statistics to your tables. Microsoft recommends an easy method is to add them to every column. While thatÂ does make things easy, it's not like statistics are free. If you…

03Nov 2016 by Grant Fritchey 1 Comment

Reinforcing the Importance of Statistics on Row Estimate

SQL Server, SQL Server 2016, T-SQL

I recently wrote an introductory post about the importance of statistics. I just received a reinforcement on how important they are during my own work. Bad Estimate I hit a weird problem while I wasÂ setting up a query to illustrate a point (blog to be published next week). Let's take the basis of the problem and explain it. I wanted data with distribution skew, so I ran this query to find out if there was a wide disparity between the top and bottom of the range: SELECT i.BillToCustomerID, COUNT(i.BillToCustomerID) AS TestCount FROM Sales.Invoices AS i GROUP BY i.BillToCustomerID ORDER BY TestCount ASC; Sure enough, the bottom of the range returned three (3) rows and the top returned 21,551. If I then run a query to retrieve just a few rows…

10Oct 2016 by Grant Fritchey 2 Comments

Statistics Are Vital For Query Performance

SQL Server, SQL Server 2016

This is postÂ #10 supportingÂ Tim Fordâ€™s (b|t) initiative onÂ #iwanttohelp, #entrylevel. Read about it here. When you send a query to your SQL Server database (and this applies to Azure SQL Database, APS, and Azure SQL Data Warehouse), that query is going to go through a process known as query optimization. The query optimization process figures out if you can use indexes to assist the query, whether or not it can seek against those indexes or has to use a scan, and a whole bunch of other stuff. The primary driving force in making these decisions are the statistics available on the indexes and on your tables. What Are Statistics Statistics are a mathematical construct to represent the data in your tables. Instead of scanning through the data each and every…

20Oct 2015 by Grant Fritchey No Comments

Statistics for the New Data Pro

PASS, SQL Server, SQL Server 2016, T-SQL

Next week at the PASS Summit I'll be presenting a session called Statistics for the New Data Pro. You can read the abstract at the link. I just want to emphasize that this is a beginner level session. I think way too many people who are just starting out with SQL ServerÂ don't understand the role that statistics play in determining how your queries are going to behave. What's more, too many people don't know how to get and read statistics to understand how it is that the optimizer thinks you have X number of rows in your database that match a given value. I'm going to make darned sure that the people who attend this session come out with a full understanding of how to read the statistics. This includes…

02Jul 2014 by Grant Fritchey 5 Comments

Statistics and Natively Compiled Procedures

T-SQL

Statistics are one of the single most important driving factors for the behavior of the query optimizer. The cardinality estimates stored within the statistics drive costing and costing drives the decision making of the optimizer. So, how does this work with the new SQL Server 2014 natively compiled procedures? Differently. In-memory tables do not maintain their statistics automatically. Further, you can't run DBCC SHOW_STATISTICS to get information about those statistics, so you can't tell if they're out of date or not or what the distribution of the data is within them. So, if I create some memory optimized tables, skip loading any data into themÂ and then run this standardÂ query: SELECT a.AddressLine1, a.City, a.PostalCode, sp.Name AS StateProvinceName, cr.Name AS CountryName FROM dbo.Address AS a JOIN dbo.StateProvince AS sp ON sp.StateProvinceID =…

11Mar 2014 by Grant Fritchey 4 Comments

sp_updatestats Is Not Smart

SQL Server, T-SQL

No, I don't mean the use of sp_updatestats is not smart. It's a fine, quick mechanism for getting statistics updated in your system. But the procedure itself is not smart. I keep seeing stuff like "sp_updatestats knows which statistics need to be updated" and similar statements. Nope. Not true. Wanna know how I know? It's tricky. Ready? I looked at the query.Â It's there,Â in full, at the bottom of the articleÂ (2014 CTP2 version, just in case yours is slightly different, like, for example,Â no Hekaton logic). Let's focus on just this bit: if ((@ind_rowmodctr <> 0) or ((@is_ver_current is not null) and (@is_ver_current = 0))) The most interesting part is right at the front, @ind_rowmodctr <> 0. That value is loaded with the cursor and comes from sys.sysindexes and the rowmodctr column…

18Feb 2014 by Grant Fritchey 14 Comments

The CASE Statement and Performance

SQL Server, T-SQL

In case you don't know, this query: UPDATE dbo.Test1 SET C2 = 2 WHERE C1 LIKE '%33%'; Will run quite a bit slower than this query: UPDATE dbo.Test1 SET C2 = 1 WHERE C1 LIKE '333%'; Or this one: UPDATE dbo.Test1 SET C2 = 1 WHERE C1 = '333'; That's because the second two queries have arguments in the filter criteria that allow SQL Server to use the statistics in an index to look for specific matching values and then use the balanced tree, B-Tree, of the index to retrieve specific rows. The argument in the first query requires a full scan against the index because there is no way to know what values might match or any path through the index to simply retrieve them. But, what if we…

Tag: statistics