SELECT * Hurts Performance, Badly

Quite a few years ago, I wrote a post about SELECT * and performance. That post had a bit of a click-bait title (freely admitted). I wrote the post because there was a really bad checklist of performance tips making the rounds (pretty sure it’s still making the rounds). The checklist recommended a whole bunch of silly stuff. One silly thing it recommended was to simply substitute ALL columns (let me emphasize that again, name each and every column) instead of SELECT * because “it was faster”.

My post, linked above, showed that this statement was nonsense. Let’s be clear, I’m not a fan of SELECT *. Yes, it has some legitimate functionality. However, by and large, using SELECT * causes performance problems.

SELECT * Hurts

The most fundamental place where SELECT * hurts performance is very straight forward. You’re moving everything. Do you truly need everything? If so, I guess SELECT * is the way to go. However, if not, if you’re just moving everything just in case, then stop. Everything could include stuff that’s additionally painful to your I/O, memory, CPU and network. Do you really need to move that JSON column every time? Do you have any idea how big it is, how much data that represents? You’re going to pay more for every resource for a “just in case” style solution.

The next place SELECT * hurts performance is in the ability of the optimizer to optimize your query. Because you’re choosing to return everything, then, no matter what else happens, the optimizer must go to where the data is stored, your clustered index or heap. Now, maybe your query is filtering on the key for your clustered index. Great. It’s going to be able to perform a seek, find the row or rows, and then return everything, inflicting the pain we mentioned up above.

Maybe though, you have an non-clustered index that’s better suited to support your query. Cool. Well, except for the part where you are returning everything. So, the optimizer has to do a key lookup back to the clustered index (or a RID lookup to the heap) to get all the rest of your columns. Now, you could fix this, in some situations, by running an INCLUDE for all columns on the non-clustered index, effectively doubling your data storage, not to mention the overhead of maintaining all that data over time. So, that’s a value of “fixed” that might not be tenable for most of us.

A limited column list could mean a single index lookup. Yay!

Let’s also talk a little bit about contention. The more stuff you move around, and SELECT * moves everything, then, the longer it takes to read a given page. The longer that page read takes, the longer other things wait. The longer one thing waits, the longer others wait. We can keep going. Moving everything is certainly going to lead to resource contention.

Also, several people pointed out in the comments on the old post, SELECT * leads to code breaking. People will code, expecting a particular column order, or a particular set of columns. Then, SELECT *, which doesn’t guarantee any particular order on the columns, brings back a different order or additional columns. Code goes BOOM! Yeah, maybe they shouldn’t have coded it that way. But then again, you should have used a specific column list.

Conclusion

We can get into a bunch of other, more granular examples, but you get the idea. SELECT * can truly, deeply, inflict lasting pain on your database. I don’t recommend it’s use. Sure, there are exceptions. However, and I say this a lot, exceptions should be just that, exceptional (as in, rare). If your standard rule is that everything you do is an exception, it ain’t an exception any more, hoss.

I hope this helps clarify things just a little.

11 thoughts on “SELECT * Hurts Performance, Badly

  • hollis

    I typically write ‘exists’ filters as ‘exists ( select 1 from [table] )’, but I’ve wondered if there is actually any performance difference between that and ‘exists ( select * from [table] )’? I would think SQL Server would optimize the latter, but I still do this anyway.

    • You can also use SELECT 1/0 in an exists. You won’t get an error, it won’t affect performance. Truth is, in an EXISTS clause SQL Server simply doesn’t care about what you type after SELECT. The sole purpose of EXISTS is to check for existence of rows, not to check their contents.

      I personally prefer to use SELECT *. One reason is that this is how the ANSI standard defines the EXISTS clause. I’m not 100% sure whether other things are actually forbidden, but I AM 100% sure that EXISTS (SELECT * … is the official syntax, so any product will support this. Other constructions might be vendor-dependant.
      A second reason is that the * in SELECT * works as a subtle reminder that we’re looking for existence of a row, without caring for the data i that row.

      But performance wise, there’s no difference.

  • I have to disagree with you, Grant.
    Even if you need everything, you still should not use SELECT *. Go the extra mile and type out those columns. Or use a smart tool, such as standard SSMS drag and drop from the object explorer. Or a certain commercial tool that I won’t name because marketing for SQLPrompt is your job and not mine. Do not use SELECT * at the outer query level. Ever.

    Why? Simple. You need all columns. Cool.
    But one year from now, someone adds five extra columns. Now those too will be transported across the network, wasting bandwidth. Plus potential other issues with indexes no longer being covering.
    Then yet another year later, I get to do an impact analysis. My manager wants an estimate of the effort for changing the FirstName column from varchar(50) to nvarchar(80) so we can go international. I do a hard search on my code for ‘%FirstName%’. Do you think the stored proc with SELECT * comes up?
    Four months after that, we finally get around to deleting the Religion column that we haven’t been using since GDPR. It’s not the right-most column in the table. If I had listed the columns, I would (again) have found the impact here during impact analysis. Or if I had skipped that I would find it during automated unit testing. With SELECT *, the test will pass.So now I might find it during integration testing, or not find it at all until bug reports pour in, depending on how the client code processes the incoming rows.

    Just three examples. I’m sure there are more. Really, SELECT * has no advantages over providing the full column list. None at all.

  • It seems to never get old. In production I would fear the select * for relying on how the columns are ordered at a given time. Exceptional use case is to develop/explore to check what to expect out of the blue. It comes then with his TOP friend.

  • Andrew J Kelly

    In today’s world SELECT * really hurts when you use Column Store. Realistically you almost never need every column and with Column Store any additional columns i the list that are not required have a huge penalty for performance. The bottom line is always specify just what you need.

  • Thomas Franz

    I often use SELECT * in the outer query of a subquery, where it should be fine:

    SELECT sub.*
    FROM (SELECT col_1, SUM(col_2) AS sum_col_2 from sub.tbl GROUP BY col_1) as sub

    ———
    Another VERY important point are stored procedures and views, where someone used SELECT *.
    Views will be compiled at creation time and procedures (usually) when they are executed first.

    When you drop an unused column from a table or drop two computed columns and add them again with a little bit changed formula but in reverse order or they were in the middle and are now simply at the end), the view will still return the columns in the original order but with the new content, so that the values are switched (which can lead to riddiculus and hard to find / debug errors):

    Watch the gross / net values switching their position in the following example:

    USE tempdb
    DROP TABLE IF EXISTS dbo.tbl
    CREATE TABLE dbo.tbl (id INT IDENTITY NOT NULL PRIMARY KEY
    , amount INT NOT NULL
    , price INT NOT NULL
    , vat DECIMAL(5, 2) NOT NULL DEFAULT 0.2
    , total_price_gross AS amount * price * (1 + vat)
    , total_price_net AS amount * price
    )
    INSERT INTO dbo.tbl (amount, price)
    VALUES (5, 10)
    , (10, 20)
    , (20, 30)
    , (30, 20)
    go
    CREATE OR ALTER VIEW dbo.v_tbl AS
    SELECT *
    FROM dbo.tbl AS t
    go
    SELECT * FROM dbo.v_tbl AS vt
    SELECT vt.total_price_gross FROM dbo.v_tbl AS vt WHERE id = 1

    GO
    ALTER TABLE dbo.tbl DROP COLUMN total_price_gross
    , total_price_net
    ALTER TABLE dbo.tbl ADD total_price_net AS ROUND(amount * price , 2)
    , total_price_gross AS ROUND(amount * price * (1 + vat), 2)
    go
    SELECT * FROM dbo.v_tbl AS vt
    SELECT vt.total_price_gross FROM dbo.v_tbl AS vt WHERE id = 1
    GO
    DROP VIEW IF EXISTS dbo.v_tbl
    DROP TABLE IF EXISTS dbo.tbl

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.