Execution Plan Compile Termination

Recently I’ve been seeing a lot of people with bad execution plans, desperately trying to tune them, but they were unable to explain why they had such bad plans. More often than no these were larger queries, with a number of derived tables, CTE’s, CROSS APPLY, etc. In most cases the statistics appeared to be fine (this is usually checked by comparing estimated & actual rows within the operations in the execution plan) and the plans themselves didn’t look crazy, but the execution plans were flat out, not good.

If you’re looking at a plan and it doesn’t make much sense, one option that most people don’t check… SQL Server didn’t have enough time to complete optimization. The optimizer is a pretty amazing bit of code. The scary volume of work it does in frightenly short periods of time is quite awesome. However, it doesn’t always get done. To quickly check this situation you need to determine the “Reason For Early Termination Of Statement Optimization.” In the graphical execution plans this is found by going to the final operator in the plan, usually an INSERT, UPDATE, DELETE, or SELECT operator. Most people know to hover the mouse over the operator & get interesting little facts, but most people forget about checking the properties. Right click on this, or any other operator, and select Properties from the context menu. This will open the properties window, which looks something like this:

Properties

Right near the bottom you can see that this execution plan is showing an early termination reason of “Time Out.” That means that the optimizer didn’t finish optimizing. When the optimizer is running, it tries a whole series of different joins, join order, filters, what have you, all through some mathematic process that I’m not even remotely qualified to describe. But if it runs out of time, it goes with the last one it had. The last one it had might be the worst possible plan for this query, but because it hit a time out…

You can also find this right in the XML for the execution plan. In the Batch/Statements/StmtSimple property find the @StatementOptmEarlyAbortReason. It will show the same thing, “Time Out,” when the optimizer has run out of time.

You can get other early abort reasons, but the best one is “GoodEnoughPlanFound.” That means that the optimizer was able to complete it’s job. That doesn’t mean you have a good plan, but it means that you don’t have some interim pile of junk that is doing you more harm than good.

What can you do to fix this? Write simpler queries.

OK. Maybe that’s not helpful. Try to simplify the query. You may not be able to. If not, you might try rearranging the query, changing the join order. Yes, as everyone says, join order doesn’t matter, but the full statement is “Join order doesn’t normally matter.” Sometimes it can make a real difference. You may also have to supply some query hints, join hints, etc., to try to nudge the optimizer along.

15 thoughts on “Execution Plan Compile Termination

  • scarydba

    The optimizer itself decides on what it thinks will be an acceptible length of time to attempt to optimize a query. It’s based on the cost estimates it arrives at after the first pass throught the query. I know very few of the details as to how this works. You can’t set it and the length of time varies, so there’s not a hard number I can provide.

  • Great post, Grant! Few people realize, but IMHO, this is the primary reason that splitting a subquery off into a #temp table is a big win so often: not because temp tables are inhrently better than subqueries, but because it has the side-effect of splitting one big query into two smaller queries, giving the optimizer a much better chance of finding a good plan for each of them.

  • scarydba

    You don’t need me to validate you, but you’re 100% correct. While the optimizer is an absolutely amazing bit of code, it’s not infallible. Most of the time, it will do the work for you, but occasionally, you have to pitch in and try to help it out, restructuring, rewriting, etc. Thanks for bringing that up.

  • Something that a lot of people don’t realise is how inherently difficult finding the ‘most optimal’ plan actually is. I didn’t fully, until I started studying AI in detail and started reading up on planning algorithms and what’s involved in generating and modifying plans. It’s an area with a fair amount of active, ongoing research

    If the optimiser was allowed to examine plans until it found the best plan for a particular query, the optimisation of a moderately complex query could easily run for days.

    That’s one of the reasons that it doesn’t look for the best plan. All it’s looking for is one that’s ‘good enough’

  • They should use a low-priority process in the background to further optimize plants according to prioritization.

    Prioritization could be:

    * How often is this query executed
    * Was I able to find a good enaugh plan or not
    * Query complexity estimate
    * Is the query executed as part of a larger transaction or not

    Without adding MAGIC, a better job can be done…period!

  • It should also discover this way the level of parameterization that is requered. If on a 2nd or 3rd run the execution engine detects that what at first is considered a variable is in fact not or just only weakly modifying like status = 0 or status = 1.

    Appropiate plans should be deviced for later resuse in spare cpu time. This can easily overlap with IO operations and thus never affect perfomance in a negative way.

    The executon engine can check for discovered predicates and pick the optimized plan based on the “variable” input.

  • scarydba

    Considering everything the optimizer does, I suspect this might be difficult, but it reall does make sense. You’re probably on the track that the guys writing the engine are. As CPU power increases, the ability to more in less time should open up just the types of things you’re proposing.

  • While it sounds easy, in general (and I’m talking general planning, not specific to the SQL optimiser), modifying an existing plan is harder (more complex, more resource intensive) than scrapping the plan and starting from scratch.

    There’s a very nice academic paper around on that that does a formal proof of that.

  • What might be possible is for a background process to work through the cached plans looking for ones that are no longer optimal (though how to tell that a plan is not optimal is another difficult problem) and mark those for recompile. Could help out in cases of parameter sniffing or when the optimiser wasn’t able to do a good enough job the first time or in other odd cases where there is no single optimal plan.

    It’s a trade off. Each version the optimiser’s getting more complex, more resource intensive, so you want to cache plans more so as not to incur that overhead more often than necessary. On the flip side, cache too much and the queries don’t get good plans.

    It’s another area of research. Use Google Scholar and search for ‘overfitting’ if you’re interested.

  • The idea is not to start over with the plan or modify it, but conserve the state that left on the first pass when the normal process was stopped prematurely by the determined timeout.

    A conservative implementation thus only keeps memory resources around for problem cases when no good enaugh plan was found on the initial pass. Resources that are freed as soon as a good enaugh plan is found and replaces the old ‘optimization incomplete’ plan.

  • rajg

    Great post Grant…I enjoy reading your posts very much. There is always something new I learn reading your articles/posts.

    Thanks.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.