Execution Plan Cost Estimates

SQL Server, T-SQL
It's been emphasized over and over that the costs of operations within an execution plan, and the estimated costs of the plan themselves are, in fact, estimates. But it goes further than that. The estimated values are based on statistics, or the lack thereof. Statistics themselves are also estimates. This means that the costs you're seeing are extrapolations based on extrapolations. So, you should just ignore those values and move on, right? Wrong. In order to understand how the optimizer is choosing to put together an execution plan for your query so that you can use that understanding to then make intelligent choices as to modifying the query or the structure of your database, you must use the values you have at hand. However, you must also understand where and…
Read More

Statistics Update Clarification

T-SQL
By default statistics are created automatically within SQL Server. And, by default, these stats are updated automatically based on a set of triggers. The triggers are defined as If 0 rows - Any data added leads to a statistics update If < 500 rows - 500 rows added causes a stats update If > 500 rows - 500 rows + 20% of the number of rows causes a stats update (unless you enable a traceflag in 2012 in which case you get a proportional value instead of 20%). There are some exceptions for temporary tables and some variations for filtered statistics and filtered indexes, but you get the idea. I was writing an article on statistics in preparation for another Oracle/SQL Server discussion (on, you guessed it, statistics) and I…
Read More

Clustered Indexes Have Statistics Too

SQL Server, T-SQL
It may seem obvious, but I've heard more than one person suggest to me that statistics on a clustered index just don't matter. That if the clustered index can satisfy a given query, it's going to get selected. That just didn't make any sense to me, but I haven't seen anyone set up a test that shows how it might work one way or the other. Here you go. First, I'm going to create a table and load it up with data. I'm intentionally using strings because I don't want to confuse the ease of management of integers within indexes. I also went for one column that would have a very attractive set of statistics and one that would have a very ugly set. Also, because we're only dealing with…
Read More

Statistics in Execution Plans

SQL Server, T-SQL
I was presenting on execution plans when another question came up that I didn’t know the answer to immediately. Yes, I know you’ve seen that phrase before on this blog. I love presenting because you get exactly the kinds of questions that make you think and make you learn. I’m presenting, in part, to learn, just as much as I am to teach. It was the same with kenpo. The more I taught, the better I learned the art. Wait, this isn’t supposed to be a blog post about learning. This one is about statistics. The question was, does the execution plan have the statistics that were used by the optimizer to decide on the execution plan. And no, what was meant, was not does it show the estimated rows,…
Read More

SQL University: Introduction to Indexes, Part the Third

SQL Server, T-SQL
Nice to see most of you have managed to fight your way through the shoggoths outside to attend another lecture at the Miskatonic branch of SQL University. This will be the third and final part of the introduction to indexes lecture. Please, if you're going mad, step out into the hall. Our previous two lectures introduced the concept of indexes and then talked about two types of indexes, clustered and nonclustered. This lecture will cover the concept of statistics as they relate to indexes. If you followed the previous lecture then you know that indexes are stored in a Balanced Tree or B-Tree structure. You know that this storage mechanism is intended to provide fast retrieval of data. But, how can the query engine inside SQL Server know which index…
Read More

Index Statistics

T-SQL
The other day a developer showed up at my desk. They were getting time-outs in production on a query that didn't normally give them trouble. With the parameters they provided, I ran the query. It ran for over 30 seconds, the application side timeout, before it returned it's data. So I ran it again with an execution plan. It had a bunch of index scans with loop joins across thousands of rows and even created a table spool with 700 million rows as part of the process. Clearly not good. Next I looked at the query plan. It wasn't too bad, as these things go. It was probably moving too many columns and apparently the business wanted a pivot on the data since they were using an aggregate method to pivot some…
Read More