Quite a few years ago, I wrote a post about SELECT * and performance. That post had a bit of a click-bait title (freely admitted). I wrote the post because there was a really bad checklist of performance tips making the rounds (pretty sure it’s still making the rounds). The checklist recommended a whole bunch of silly stuff. One silly thing it recommended was to simply substitute ALL columns (let me emphasize that again, name each and every column) instead of SELECT * because “it was faster”.
My post, linked above, showed that this statement was nonsense. Let’s be clear, I’m not a fan of SELECT *. Yes, it has some legitimate functionality. However, by and large, using SELECT * causes performance problems.
SELECT * Hurts
The most fundamental place where SELECT * hurts performance is very straight forward. You’re moving everything. Do you truly need everything? If so, I guess SELECT * is the way to go. However, if not, if you’re just moving everything just in case, then stop. Everything could include stuff that’s additionally painful to your I/O, memory, CPU and network. Do you really need to move that JSON column every time? Do you have any idea how big it is, how much data that represents? You’re going to pay more for every resource for a “just in case” style solution.
The next place SELECT * hurts performance is in the ability of the optimizer to optimize your query. Because you’re choosing to return everything, then, no matter what else happens, the optimizer must go to where the data is stored, your clustered index or heap. Now, maybe your query is filtering on the key for your clustered index. Great. It’s going to be able to perform a seek, find the row or rows, and then return everything, inflicting the pain we mentioned up above.
Maybe though, you have an non-clustered index that’s better suited to support your query. Cool. Well, except for the part where you are returning everything. So, the optimizer has to do a key lookup back to the clustered index (or a RID lookup to the heap) to get all the rest of your columns. Now, you could fix this, in some situations, by running an INCLUDE for all columns on the non-clustered index, effectively doubling your data storage, not to mention the overhead of maintaining all that data over time. So, that’s a value of “fixed” that might not be tenable for most of us.
A limited column list could mean a single index lookup. Yay!
Let’s also talk a little bit about contention. The more stuff you move around, and SELECT * moves everything, then, the longer it takes to read a given page. The longer that page read takes, the longer other things wait. The longer one thing waits, the longer others wait. We can keep going. Moving everything is certainly going to lead to resource contention.
Also, several people pointed out in the comments on the old post, SELECT * leads to code breaking. People will code, expecting a particular column order, or a particular set of columns. Then, SELECT *, which doesn’t guarantee any particular order on the columns, brings back a different order or additional columns. Code goes BOOM! Yeah, maybe they shouldn’t have coded it that way. But then again, you should have used a specific column list.
We can get into a bunch of other, more granular examples, but you get the idea. SELECT * can truly, deeply, inflict lasting pain on your database. I don’t recommend it’s use. Sure, there are exceptions. However, and I say this a lot, exceptions should be just that, exceptional (as in, rare). If your standard rule is that everything you do is an exception, it ain’t an exception any more, hoss.
I hope this helps clarify things just a little.