If you get an execution plan that looks like this:
I wouldn’t blame you for immediately thinking about query tuning. Especially if the code that generated it looks like this:
SELECT soh.OrderDate, sod.OrderQty, sod.LineTotal FROM Sales.SalesOrderHeader AS soh INNER JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID WHERE soh.SalesOrderID IN (@p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99);
Let’s replace this with a table variable, maybe even one passed in as a parameter. The plan then looks like this:
Ah, much prettier. I’m happy now, all’s right with the world… But, just in case, let’s look at performance. The first query ran in about 2.2ms and had 599 reads. The second query ran in about 24ms and had 598 reads… crud.
Well, let’s modify everything again. Instead of a table variable, we’ll use a temporary table and get some statistics into this puppy which will clean things right up. Here’s the new plan:
Looks pretty familiar, although there are slight differences in the cost estimates between this plan and the preceding one. But the run time is 85ms with 714 reads AND I had to create the temporary table which added time to the whole thing.
Doggone it, that other plan is heinous and ugly and so is the query which uses an IN clause. Shouldn’t the cleaner, simpler, execution plan be an indicator that we’re going to get better performance?
The thing is, just because the execution plan is simple and easy to understand does not mean it’s going to perform well. You can’t simply look to an execution plan to understand performance. You have to measure the run times on the query, look to the resources it’s going to use in order to understand where waits are likely, look at it’s reads, and take all this into account, along with understanding what the execution plan is doing in order to make the appropriate choices for performance on your system.
I kept working with this because I was convinced I could get faster performance. The main difference as I saw it was that the optimizer sorted the data in the IN clause and I wasn’t explicitly sorting the data in any of my replacement queries. But nothing I did resulted in better execution times. And that was upsetting.
First, when you’re tuning a query, you’re going to look at the execution plans, as I did above. But, when you want to measure the performance of queries, it’s a very good idea to turn off execution plans and just capture the query metrics. I knew this and was doing it and you could see the results in the Extended Events where I was capturing each statement for the SPID I was working within. I also had the SET STATISTICS IO and SET STATISTICS TIME enabled for the query. Since each execution would cause those to fire as part of the statements and those were making my ExEvents window messy, I decided to turn them off… WHOA! Query execution times radically changed.
In fact, my first attempt at tuning the query, substituting a table parameter, was suddenly faster than the original. The fastest was when I pre-sorted the data in a temporary table (discounting the costs of sorting and inserting the data into the temp table just for the moment). In fact, the prettiest plan was indeed the fastest.
Experimenting further, it was the STATISTICS IO that completely changed the execution times.
In short, pay no attention to my original post above, instead, let the lesson be that I need to be very cautious about the Observer Effect.