PASS Summit 2010, Day 3 Key Note

Today is Dr. Dewitt.

The ballroom, where the keynotes are held, is filled with extra chairs. The Summit organizers expect extra attendance today, and well they should. Dr. Dewitt was amazing last year. I suspect this year will be more of the same.

Rick Heiges is introducing the day (waiting for Dr. Dewitt). Lynda Rab is leaving the board. Sad. I started volunteering for the PASS organization working for Lynda. She’s great. The new board members are Douglas McDowell, Andy Warren and Allen Kinsel.

The spring SQL Rally event was announced. I’ll be presenting a full day session on query performance, Query Performance Tuning, Start to Finish. Look for (a lot) more blog posts on this. The Summit next year has been moved to mid-October. WHOOP! This is great because I was going to miss it next year. Oct 11-15 will be the dates in 2011. Of course, it’ll be at Seattle.

Dr. Dewitt is finally on stage. From this point forward, I’ll be just posting his words & some comments. This is my best attempt to capture the information. There will be typos.

Query optimization is a really hard problem. Dr. Dewitt, says “I’m running out of ideas.” Yeah, right. His “Impress Index” is basically an arrow going down. He’s cracking jokes about his delivery, asking, How Can I Possibly Impress You. He’s showing this strange picture that has 240 seperate colors that each represent an exec plan in the optimizer. We’ll be back to that. This session was voted on. I’m glad optimization won. They live in fear of regression, talking about the optmizer developers.

The 100,000 foot view, magic happens. He’s working off of TPC-H benchmark, query 8. There are 22 million ways of executing this query. The optimizer has to spend a few seconds to pick the correct plan from this full set. It’s still possible to pick bad plans. Cost Based optimization came from System R & a lady named Pat Selinger at IBM. Optimization is the hardest part of building a DBMS, after 30 years. Situation is fruther complicated by advances in hardware and functionality within the DBMS.

The goal of the optimizer is to transform sQL queries into an efficient execution plan. The parser turns out a logical operator gtree, which then goes to the optmizer and a physical operator tree is sent to the execution engine. He’s showing a simple table, based on movie reviews. The query is a SELECT with AVG. Two possible plans. A scan occurs first, then a filter is applied to pull out the right movie and then an aggregate occurs. With this you’ll get a scan, meaning I/O corresponds to the number of pages on the table. Plan 2 uses an index to pull pages from the non-clustered index. This means random disk access that will look up the movies and then pass that on to the aggregate. The optmizer then has to figure out which is faster. The optimizer estimates the cost based on the statistics it has in hand. It has to estimate how many movies there are. So it estimates the selectivity of the predicate, then it calculates the cost of the plans in terms of CPU and I/O time.

So there are equivalence rules, such as select & join operators. Join operators are associative, meaning that the results from multiple tables are associated. Select operator distributes over joins and there are multiple ways of getting back the same information, all evaluated by the optimizer.

With a more complicated query, it could start with seelction of customers, then a selection of reviews, join them together, then join to the movies table and then project out the select out the columns wanted. But with equivalence rules, you can get other plans. Selects distribute over joins rule gets a different plan, or selects commute rule can change the plan. He showed five different plans, then four more plans & said he could have done another 20. For this simple query, he came up with 9 logically equivalent plans. All nine will produce the same data. For each of the 9 plans there is a large number of alternate physical plans that the optimizer can choose.

Assuming the optimizer has three joing strategies, nested loops, sort-merge & hash. He’s also assuming two selection strategies, sequential scan or index scan. Obviously, this is simplified.So, using these three joins & two select methods, there are 36 possible physical alternatives, for one logical plan. So with 9 logical plans there are 9*36 = 324 possible physical plans. And that’s for a VERY simple query.

Selectivity estimation, is the task of estimating how many rows can satisfy a predicate like MoviesId = 932. Plan quality is highly dependent on quality of the estimates that the optimizer makes.

I just sent in a question.

So the Histogram is the distribution of the data within the table. So there isn’t enough space within the db to store detailed statistical info. The solution is histograms. You can different kinds. The equi-widthy histogram divides the rows into equal sized buckets and then figures out how many values match each range of values. So, for an actual value, it might be .059 selectivity, but the estimated value is actually .050. That’s extremely close. But, another value he shows has .011 actual but in the histogram is .082, which is a HUGE error. Hello bad execution plan.

Another approach is equi-height histograms. These divide the ranges so that all buckets contain roughly the same number of rows, as opposed to an equal distribution of values. In equi-height, the second example is .033 instead of .082. Which is pretty good, but still skewed. He’s basically showing that errors can be introduced all over the place. The first example is .167.

Histograms are the critical tool for estimating selectiviy factors for selection predicates. But errors still occur. The deal is, there’s just a limited amount of space for these. other statistics are rows, pages, etc.

Estimating costs the optimzer considers I/O time and CPU time. Actual values are highly dependent on CPU and I/O subsystem on which the query will be run. For a parallel database system, such as PDW, plug, the problem focuses also on network traffic. So back to the two alternative physical plans… You have to determine which plan is cheaper. Assuming that the optimizer gets is right, we know that there are 100 rows out of 100k pages. These are sorted on date, but we’re going for MovieID, random reads. The optimizer doesn’t know system it’s on, but it makes a guess that a scan will take 8 seconds. The Filter will work on .1 microsecond/row & aggregate will be .1micrsec/row, for .00001 seconds, for a total of 9 seconds. Plan two will use the index. Since the rows are sorted on date, random seeks are going to occur. .003 seconds / seek, then  total time .3 seconds and same time for the aggregate. This means plan two is the winner.

But, what if the estimates are wrong. On a log plot, you start to see how, as the number of rows returned, each plan will perform better, based on the rows returned. More will make plan 1 better, but less will make plan 2 better.

That was just to get the data out of a table. To add in JOIN costs, things get worse. First example is to take a sort-merge join. This sorts each data set being returned, and then merges the results through a simple scan. Cost is 5r + 5m for I/O. A nested loop works on scanning one table and row-by-row, scanning the other table. The cost is R + R * M. R is rows M is pages.

With the example, you can see that with an indexin place, highly selective, loop joins can be cheap. But it’s the cardinalities that affect things. So, getting the histogram right is the key trick. With a log plot, again, you see how the various operations vary over time. So for a sort merge, it’s very expensive at a low number of rows, but at a large number of rows, it still returns in about the same amount of time. So as large sets of data are accessed, merge gets good. But at lower numbers of rows, the nested loop works better. So if the cardinality estimate is off, you could get a huge error in performance, especially at the larger sets of data. The optimizer has to pick the right join method. This is based on the number of rows in each set of data being joined.

He then moves on talk about how much space these things take up. The space depends on the “shape” of the query. He shows a type called a “start” join and a type called a “chain” join. Whoa! as you increase tables, the likely numbers of plans increases a lot. I knew this, but I haven’t seen it written down like this. But these shapes are extremes.

Every query optimizer starts off with a left deep plan, first, instead of bushy plans. For the example, a bushy tree would have 645k equivalents for the Star Join as opposed to 10k for left deep plans. With 3 joins methods and n number of joins in a query, there will be 3 to the power of n possible physical plans. Uh… wow. Instead, the optimizer uses dynamic programming. Sometimes heuristics will cause the best plan to be missed.

One method of optimization is Bottom Up. Optimiztion is performed in N passes (if N relations are joined). First pass, find the best 1-relation plan for each relation. Pass 2, find the best way to join the result of each 1-relation plan to another relation to generate all 2-relation plans. Pass n, find the best join result… can’t see it. Gets the lowest cost plans & interesting order rows. In spite of pruning plan space, this approach is still exponential in the # of tables. Costs are done, then pruning occurs. I’ve stopped taking notes on this part. You’ll have to see how this works in the slide deck (I’ll post the location at the end).

So that’s the theory. But the problem is, bad plans can be picked. If the statics are missing or out of date, cardinaltiy estimates are against skewed data, attribute values are correlated, and regression, hardware changes mess stuff up.

Opportunities to improve. Jayan Haritsa, has the Picasso Project. Bing this: Picasso Haritsa. There are actually software there that helps improve values. He’s back to TPC-H Query 8, and using the tool, it will show the plan space for the query, this is the painting of the cool picture at the start of the talk. With this, you can see how sensitive input parameters are to plan generation. So the cardinalities estmates are the key.

This animation shows how the estimated costs for a query start low, peak, and then, instead of continuing up, goes back down. And the optimizer team doesn’t know why. This is his example of how QO is indeed, harder than rocket science.

What can you do better? Well, Indexed Nested Loops looks good, but they’re not stable across the range of selectivity factors. If they went conservative and always picked sort-merge, it would be more stable. So, picking slower operations could make things more stable, just slower. Robustness is tied to the number of plans. And he says the QO team doesn’t understand.

At QO time, have the QO annotate compiled queryu plans with statistics and check operators. Then, you can see how this stuff works. They use this in two ways, a learning optimizer and dynamic reoptimization. The optimizer observed stats go back to a statistics tracker and then, feed that back through to the catalog, and the next query will be better. The dynamic reoptimization takes the idea that actual stats note the estimated stats and when there are differences, truncate the operation, pause the execution, output the query back to tempdb, stores that, and then uses that with the rest of the query to re-optimize using real values. Cool!

Key points: Query optimization is harder than rocket science. Three phases of QO: Enumeration of the logial plan space, enumeration of alternate physical plans selectivity estimates. The QO team of every DB vendor lives in fear of regressions, but it’s going to happen, so cut the optimizer some slack.

“Microsoft Jim Gray Systems Lab” on FaceBook is the source for the slides. Available here.

6 thoughts on “PASS Summit 2010, Day 3 Key Note

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.