Same Query, Different Servers, Different Performance. Now What?

Based on the number of times I see this question on forums, it must be occurring all the time. You have two different servers that, as far as you know, are identical in terms of their options and setup (although not necessarily in terms of power, think a test or pre-production system versus production). On these servers you have a database on each that, as far as you know, is the same as the other in terms of options, objects, maybe even data (although, this does mean that you have unmasked production information in your QA environment, which potentially means you’re going to jail, might want to address this, especially now that I’ve told you about it, mens rea, you’re welcome). On each database you run, as far as you know, the exact same query (whether a straight up ad hoc query, stored procedure, parameterized query, whatever). On your production server, let’s call it PROD, the query runs for 3 hours before it returns, but on the test server, let’s call it TEST, it returns in about 3 seconds.

What. The. Hell.

This is especially troubling because PROD has 8 processors that are much more powerful than TEST, which has only 2 (either can go parallel, so this is the same). TEST only has 16GB of RAM and PROD has 64GB. Further, you know that the databases, data, and statistics are the same because you restored the TEST database from PROD (although, see that jail thing above). However, you’re on top of things. You capture the query metrics so you know exactly the amount of time and the reads or writes from each query and you even do this using extended events so you’re minimizing observer effects. You also have monitoring in place, so you know that there is no blocking in PROD. The query is just a pig there, but not on TEST. You’ve even gone so far as to get an estimated execution plan from each server and they are absolutely different.

Now what?

Well obviously something somewhere is different. Start by comparing everything on both servers and both databases down to… hang on, here, we’ll write a PowerShell script and then….

Wait, wait, wait!

You have the execution plans? Before we start digging through all the properties everywhere and comparing everything to everything, updating statistics 14 times, and all the rest, what if we look at the execution plans. They’re different, so we should start looking at scans & indexes & statistics &….

Wait!

Let’s start simple and clear. Let’s take a look at the properties of the first operator:

AllProperties

This contains a ton of information about the settings of the systems where each plan was generated. Let’s assume that, probably, the databases are the same, as is the data and the statistics, but the problem is a system or database setting. These differences can result in different execution plans and therefore different execution times. If we are dealing with two different servers and we are fairly sure the data, structure and the statistics are the same, the properties of the first operator are a great starting point for understanding what went wrong.

Oh, and the first operator is this one with the red square around it:

first operator

Let’s use the new SSMS Compare Showplan to see the differences between our two execution plans that we captured:

properties compared

(clicking on that will make it bigger)

There are a bunch of differences highlighted, but one ought to jump out pretty quick. That’s right, these two databases have different compatibility levels which resulted in one of them using the new optimizer and one using the old optimizer as evidenced by the CardinalityEstimatorModelVersion. Fun point, both have identical query hash values. Not surprising, but additional, quick, validation that we really are talking about the same query on each server (and one of them isn’t using a trace flag to change the cardinality estimator). You now have enough information to go and make some changes to your system without lots of further fumbling and work.

When dealing with the same query from two servers that ought to result in similar behavior, but doesn’t, get the execution plans (estimated plans are fine here) and compare the the properties of the first operator. That’s the quickest way to identify the issues that could be leading to the differences between the servers.


Want to talk more about execution plans and query tuning? Let’s do it.

In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m also going to be in Oslo Norway for a pre-con before SQL Saturday Oslo in September.

24 thoughts on “Same Query, Different Servers, Different Performance. Now What?

  • BWA-HA-HA-HA-HA!

    Seriously though, one time, we had a query running faster in dev than prod. Pulled our hair out for days until someone pointed out that the dev server was new and the processors were faster.

  • Great post, Grant!

    Love the conversation in the comments, just inspired to share:

    I have had some queries that were significantly slower when run with a very high degree of parallelism — one big one was 45 minutes when run on all cores and 20 minutes when run on 8 cores, for example.

    If “max degree of parallelism” was set to the default “free for all” level, there really are some situations where more cores make *some* queries slower, depending on the hardware and what the queries are doing.

  • Jeff Moden

    My general rule is that no one process can have more than 1/4 of the CPU power and I set MAXDOP accordingly. I should probably shift to using the query governor but it’s been a pretty good rule of thumb so far. The box is 24/7 (for the most part) and has both OLTP and Batch running on it at all times of the day.

    If anyone’s code actually needs more than 1/4 the total horsepower, then they have to deal with me and a peer review of the code. 😉

  • davehants

    Yep – MAXDOP can be a killer occasionally. I had a query that was taking 20 minutes and I reduced parallelism and it ran in 2 seconds! Tricky to predict in advance though….

  • Chuck

    Maybe I’m just overlooking it but where in the comparison does it show the compatibility levels? Perhaps you could circle it in red for dummies like me. 🙂

  • alerancur

    I’m having a similar issue where the same query with the same execution plan (only in the GUI) runs slower in production server than in development server. When I say the exec plans are only the same in the GUI is because the XMLs are not exactly the same, the exec plan for the slower server always has a higher EstimatedAvailableDegreeOfParallelism (sometimes x3). This is a fairly complex sproc that calls several functions and other sprocs. I’m leaning towards the parallelism issue mentioned above, is there any specific type of query where you’ve seen this happen the most? I don’t have access to the server configuration, so I want to be sure of this before asking the SQL DBAs to change something in the server configuration, besides it’s not clear to me if this is something you can change for a specific sproc or is a global setting that applies to everything. Any help will be really appreciated.
    PS: I don’t see the plan comparison option in SSMS, I’m using 2012. Was that recently released?

    • If the cost estimates are different, there are probably a lot of other differences between the plans too. There would have to be. Not seeing the plan or the server, there’s little to suggest. For parallelism, especially one between two different servers, I would always look first to the cost threshold for parallelism to see if they are the same. Also the MAXDOP. If either of these values are different, you’re likely to see differences in parallel execution.

      Beyond that, I’ll bet if you look hard enough, especially at a complex query, there are other substantive differences, seeks instead of scans (or vice versa) or differences in the types of JOIN operations done (hash, loop, merge). If the costs are different, there are differences in the plans.

  • alerancur

    Thanks Grant, the costs are slightly different, in the order of decimals (i.e. 0.150478 vs 0.150142), but I assumed it’s because data is not always exactly the same between the two environments, there might be a less than 10 rows difference, which doesn’t justify the twice the execution time between servers

  • alerancur

    Thanks Grant, the costs are slightly different for some queries in the batch, in the order of decimals (i.e. 0.150478 vs 0.150142), but I assumed it’s because data is not always exactly the same between the two environments, there might be a less than 10 rows difference, which doesn’t justify the twice the execution time between servers

  • Those costing differences suggest other differences in the plans. Those are the estimated costs based on the different operations chosen that depend on statistics, indexes, constraints and keys to determine how best to resolve the query. I’m sure that if you’re seeing cost differences, there are also differences in the plans. A special focus would be, if you can capture an actual plan with runtime statistics, to compare the differences between the estimates and the actuals on row counts. I suspect, again, this is speculation without knowledge, that you’ll see differences there, especially between the two servers.

  • In one data server we have Microsoft SQL Server 2008 R2 (SP2) – 10.50.4000.0. In another server we have 2012. The 2008 has compat level of 90 while the 2012 has compat level 110. The same query is generating different plans where one does table scan (times out) and one appears to use an existing index (returns immediately). I used Red Gate SQL Compare to compare the schemas and it said they were the ‘same’ (just few minor diffs in users).

    Looking at the plans on the 2008 server, I have neither the CardinalityEstimatorModelVersion nor the plan comparison option. Any suggestions as to how I might debug? My limited SQL DBA skills are at a loss. Examining the table that had the table scan on it, the indexes and relationships are identical (via manual compare vs just using Red Gate).

    Thanks in advance.

  • ScaryDBA

    Different versions of SQL Server is the most likely cause. Each version of SQL Server (and frequently each Service Pack and sometimes each Cumulative Update) has a different Query Optimizer. Most queries, most of the time, this doesn’t matter much (until you hit SQL Server 2014 or greater where you also toss in the change to the Cardinality Estimator). However, for some queries, those changes matter a lot. Using SQL Compare (HUZZAH!) you’re able to compare the structures and eliminate that. However, you are looking at two versions of SQL Server. That’s a likely difference. Also not mentioned is whether or not the data is the same? Differences in data will lead to differences in statistics and differences in execution plans. Even if the data is the same, when were the statistics last updated and how were they updated? Differences in statistics leads to different row estimates which leads to different execution plans. On top of all that, different servers which could have different memory and CPU settings leading to different execution plans. Also, different cost threshold for parallelism settings could lead to different execution plans.

    Going through those additional checks could explain why you’re seeing different plans, but at the core, you’re still comparing 2008 to 2012. That maybe the fundamental difference.

    • Thanks. It turns out that rebuilding indexes and (or) updating statistics did enable the query to run (vs timing out). On the 2012 server there was actually much less data (so I would have expected it to run faster) than the 2008. So indeed the data was different, but the fact that one did a table scan and one used an index surprised me that it would be so different. Anyway, I’ll keep an eye on it and see if I can add the index that SQL recommended.

      • ScaryDBA

        Without seeing the code or the structures I’m just making an educated guess, but I’ll bet it was the statistics. That’s almost always the case. Plus, rebuilding an index (skipping the fact that it updates statistics) won’t change the execution plan. Index fragmentation isn’t taken into account by the optimizer.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.