Why Don’t People Use Columnstore Indexes?

I saw this question on SQL Server Central the other day and had an immediate, visceral reaction. I know why.

Now, before I explain my answer, please, let me reassure you. I get it. You’re busy. What I’m about to suggest is not meant as a direct critique of you. It’s just an observation of the human condition. Heck, maybe I’m wrong. So, before you write the angry screed about how busy you are and why you can’t possibly do what I’m about to suggest, believe me, I already understand. I’m still going to suggest something that’s going to make some of you angry.

Common Knowledge

If you’ll permit me, I want to talk about Extended Events before we talk about Columnstore.

SIDE NOTE: Standing invitation, any time I’m at an event, if there’s time, ask me, and I’ll do an impromptu class on Extended Events, why you should be using them, how to use them properly. Just ask.

Extended Events were introduced in 2008.

They sucked.

Seriously. They were not quite ready for prime time. Believe me. I tried to use them, but they just weren’t functional. 2008R2 didn’t improve the situation. Others, such as Jonathan Kehayias, saw the inherent potential, and started to use them. I, on the other hand, stayed firmly with #TeamTrace.

Then, Microsoft fixed Extended Events. 2012 just rearranged the landscape. They were awesome. I started using them. Now I’m on #TeamExEvents and I couldn’t be happier about it.

However, it was too late. Common knowledge was firmly established. Extended Events suck (so very wrong). They’re hard to use (they’re not). Extended Events don’t capture as much data as Trace (yeah, they capture more). They don’t have as much functionality (ok, with the exception of marrying Perfmon data to Trace Data, Extended Events actually have a lot more functionality). The GUI is horrible (well, it is, until you understand how to use it properly, see my invite above).

Columnstore And Common Knowledge

Columnstore indexes were introduced in SQL Server 2012.

They sucked.

OK, that’s actually strong. They were awesome. However, the fact that they were read only seriously limited their adoption. Common knowledge was very quickly established that most of us would never use these. They’re a horror to maintain. Yeah, for certain problem spaces (you know, that tiny problem space called “data analysis”), they’re useful, yet, most of us won’t benefit and the pain is too high.

Then, in SQL Server 2014, updateable columnstore indexes were introduced. The world changed, but no one noticed. Why?

Common knowledge.

It was already common knowledge that columnstore indexes didn’t work for most of us.

Fact is, that’s not true. Now that we have clustered columnstore and non-clustered columnstore, you can go nuts. Most of your data access is through analytical channels? Awesome, use a clustered columnstore. Sometimes though, you need point lookups. Not a problem, add a nonclustered b-tree index to the clustered columnstore. Go here to learn more about Columnstore Indexes.

In short, today, we can completely orient our data storage with our principal data access. Yet, most people are not using these things at all.

Because, common knowledge is already established.

It’s Common Knowledge

Common knowledge is a trap. Don’t believe me?

Common Knowledge of Page Life Expectancy
Everyone Knows
  • It’s common knowledge that your Page Life Expectancy has to be greater than 300 or there’s a problem on your server.
  • Everyone knows that the best way to deal with excessive parallelism is to set the MAXDOP = 1 on your servers.
  • CXPACKET waits are a completely useless wait statistic and should be ignored
  • Never write a query with more than three or four joins

I can keep going, but here’s the point. Every one of these statements is either utterly false, or has huge caveats attached. For example, CXPACKET waits aren’t useful… prior to SQL Server 2016 SP1 (or maybe a CU, I forget). After that, in other words, with all the modern versions of SQL Server and Azure SQL Database, CXPACKET waits are strong indicators of issues with your system. Don’t ignore them.

Yet, all this “common knowledge” is constantly being perpetuated. I still can’t believe I see the suggestion that three joins are too many. This was a myth in SQL Server 6.5, let alone today. Yet, it’s out there.

Why?

You!

It’s you.

You’re not spending your time learning. You’re learning once. You’re evaluating a technology once. You’re picking up the common knowledge one time, then never assessing it, ever again.

I get it. You have work to do. Learning isn’t a part of your work. Oh, but there you’re wrong. It is. Technology moves. You are a technologist. You must learn in order to move with the technology. You must get in the habit of reassessing assumptions and reevaluating that the common knowledge that you have, might be wrong, old, not applicable.

Because, the next thing most people do after they learn something is repeat it. “Hey boss, found the answer to the problem. Put FAST 1 on every query and it’ll run faster.” or whatever myth we’re perpetuating today. Then, it gets added to our “Best Practices” documents, built into the monitoring tool and repeated every time you answer a question from a Junior DBA or online.

Conclusion

As I said at the top, I understand. There’s not enough time in your day/week/month/year to keep up with the changes to technology. I’m sorry, but you’re just going to have to make some time. Carve out a piece of your day/week/whatever to get some learning in. Get to your local user group on your own time. Go to SQLSaturday. Read blogs by trusted industry experts who are keeping up with the technology shifts. Do what is necessary to reevaluate the “common knowledge” you have in your head.

Do this because, when you don’t, you’re not simply hurting yourself. It’s not that, oh, you’re not keeping up with the cool kids. It’s because you’re misusing, abusing, not using, tools that would improve your organization. Extended Events is an amazing technology. Learn it (see my invite). Columnstore fundamentally changes the data access and performance tuning game. Figure out how you can use it. Your organization will benefit. Further, you, and your career, will benefit.

Done. You can be mad at me now.

53 thoughts on “Why Don’t People Use Columnstore Indexes?

  • Sean Redmond

    I think that you are correct about Microsoft constantly releasing new features that are only half-baked and this forming the unnecessarily low impression of them in our heads.

    I remember learning about Columnstore indexes in an update course for SQL Server 2012 for us SQL Server 2008 people. They were something for data-warehouses our instructor told us. Read-only but very fast. Lots of gotchas too. Very useful in a few specific circumstances.

    And this is what stuck. Future versions of SQL Server had things like ‘improvements to columnstore indexes’ but, by then, the idea of columnstore indexes was in the realm of reporting and SSAS, not the busy world of OLTP.

    • Thanks for the feedback. Yeah, I hear you. A dear friend of mine told me that they wouldn’t use Extended Events until they could see the query text in the results. It took me 10 minutes of standing there with my mouth open before I realized that they were referring to knowledge that was out of date by more than seven years.

      We could throw rocks at Microsoft for this. However, let’s face it, stuff moves, improves, changes. Even if Microsoft releases something fully functional like Query Store, the way it functions, what it provides, the stuff you can do with it, shifts with every update. Query Store today, three versions later, is not Query Store from 2016. So we can’t beat them up for always releasing stuff half baked. Query Store was fully operational day one. Then, they changed it and made it better. However, if you only have the day one knowledge, you might think, “well, queries are nice, but I’d love to see the wait stats” which are in there now. Or, ” well, we like it, but we want tighter control over which queries get capture” which was added in 2019.

      Our jobs aren’t easy. We have to learn how to do stuff. Then learn the business so that we correctly apply the stuff we learn. THEN keep up with changes to both so that we’re still making good decisions. None of this is easy.

    • David LEAN

      The challenge for Microsoft is development time. I sat thru many meetings where someone outlined some fantastic new feature. eg: Distributed DB’s.
      But as we broke it down it became obvious that it would be a 9 year development cycle. So it was broken up into staged releases. eg: Distributed DW (stratus), Distributed OLTP, Scale up, etc)
      Usually looking to help solve the most painful problems, & often niche scenarios, first. Or the need to set a foundation.

      We saw this with the introduction of Mirroring. Everyone looked at the tech introduced before it, claiming “oh if only you’d done this extra bit & made it mirroring”. The dev team knew it, had it in their plans, but couldn’t do it all in one release.

      The annoying thing for me was that VP’s would then assess adoption rates too early, immediately after the product shipped. So many v2.0 phases got shelved ie Management DW. & it became a dead feature.

      • None of that can be easy. I’m amazed some things survived long enough to be improved and made useful. Extended Events comes to mind. In 2008, they were a horror and adoption is STILL low because of it. I could see how, what is now a fantastic tool, could have been cancelled before it had a chance to develop.

  • Chris Wood

    As both you and I know this is a life task – you have to keep on learning as everything changes. Sometimes quickly and sometimes slowly. But it does change and will keep on changing.

  • sbrickey

    one of many areas that Microsoft is “testing the waters” for adoption, and ends up delivering a half-assed solution.

    other examples that come to mind:

    – why can’t SQL Expensive Edition perform online index rebuilds for indexed views?

    – why can’t I use FORCESEEK query hints on indexed persisted computed columns?

    – why can’t I use the spatial functions across SRIDs? I get the technical differences between SRIDs, but every other database platform that supports GEO datatypes also provides SRID translation functions.

    – why can’t I use partitioned views for limiting spatial queries – aka 50 tables for spatial data on each US state with check constraints on state, so that queries limited to FLORIDA only use the FL table.

    – why can’t I use spatial indexes WITH b-tree indexes – aka BTree on State then spatial within btree?

    (in general there seems to be a few issues with spatial trees across tables)

    I can go on… but yes, I find many that features aren’t fully baked.

    I want to try some of the graph tables and functions, but I just don’t know whether I’ll be able to trust the performance… Hell, I can’t even imagine releasing the graph tables in 2017 without the ShortestPath function that finally appeared in 2019…given that I can use standard relational tables instead, and in doing so i CAN control the performance very carefully, converting to graph tables adds risk with very little reward.

    I should add… it’s not just sql… it’s .Net, Windows, etc… WPF had the same issue – first few releases had sucky performance… .Net has tons of issues with compatibility, to which the response has been “but we didn’t break the old stuff, we just told you to move everything over to the new stuff, and then redo all the testing to find out what breaks”… Pepperridge Farm remember when Microsoft would spend a lot of time testing and verifying new releases.

    and thus is the constant struggle with the recent trend of Microsoft’s “Release early, release often [, release half-baked]” modern lifecycle.

    • We could always switch over to open source where everything just works… oh wait. A lot of the same issues. However, there you could, if you wanted, and had time, start learning coding and fix the stuff you wanted to fix.

      I don’t think this is all that special to Microsoft. Most organizations (my own included) go with the Minimal Viable Product (MVP) approach to development so that you can throw stuff at the wall to see what sticks. The problems arise when something sticks for you, but not everyone else, so the organization doesn’t follow up on fixes and functionality.

      However, none of this changes the fundamental issue that we, IT people as a whole, don’t spend the time we need to learning stuff.

  • This is related to my “rant” that when I started with SQL Server it was possible to be an “expert” on the entire SQL Server system because it was easy. It was a RDBMS, that was it. Now, there’s that, SSAS, SSRS, SSIS, it runs on Linux, there’s Azure, and so much more. So yeah, it’s not possible to know everything, but you’re right. That’s no excuse not to continue to learn.

    As you know, I recently jump on the #TeamExEvents bandwagon and am not looking back.

    Learning new stuff is one reason I’m writing so much for Redgate’s Simple-Talk. It challenges me!

    Evolve or die.

  • Grant, you are the man. This post explains exactly what pushed me to become a speaker. You think columnstore indexes are bad? Try indexed views! Despite the improvements, that feature has been torn to pieces – even by some MVPs! Based on what you said about resistance to learning extended events, it makes sense though. Implementing indexed views is not something that should be done without extensive monitoring, testing, and research – a task that would be considerably easier if you already have some experience using extended events!

    I have a question though: when you do learn new features of SQL Server and if you learn “too quickly” for anyone to establish best practices for a given feature, then is that just as bad as this “common knowledge” problem? I don’t think so, I think you could consider it to be a calculated risk. Who knows, you might be the one in the first position to establish best practices if you do it correctly before anyone else! Thoughts?

    • Thanks.

      Yeah, it’s entirely possible to get ahead of the knowledge and misuse a technology. I’ll throw myself under the bus here. If you look at how I used Extended Events in the 2014 edition of the query tuning book, it was wrong. Bad even (not dangerous mind, just less effective). I’ve learned better since, but that bad info is out there now.

  • I’m following a pilot on Youtube (Captain Joe) and he always ends his video’s with “A good pilot is always learning”. The same goes for a DBA. You have to keep learning. If you don’t sooner or later you’ll hit some sort of wall. In my case, i have to be a few steps ahead of clients to be able to advise them. And i’m honest when i don’t know something or only heard of things.
    One of the things i need to invest time in (besides query tuning, azure, SQL 2019, PowerShell and dbatools) is extended events. I’ve tried them on 2012 and just didn’t get it. That was 7 years ago and i have to make up for lost time.

    My boss helps me by making sure i get time to learn on the job. I can go to conferences or spend a day watching Youtube vids from the experts. As long as i share knowledge.

    A few days ago i heard the term “see one, do one, teach one”, coming from the medical side. The first two parts of these are ease. The third one will be one of my major goals in the near future. Blogging is a way but maybe i should try and start speaking or something. Though i still struggle with the “why would people come and see me…”.

    • Same reason people come and see most people. It’s the content. Not the speaker. Sure, sometimes, some names, draw people in regardless of content. However, most of the time, it’s the content. You know your stuff. You’d be good.

  • Colin Allen

    Excellent article. I come across many DBAs who have obviously gone into fine detail at some time but not kept up to date, so they will go on about multiple tempdb files on dedicated disks and identical initial size and autogrowth with trace flag 1118 and 1117 enabled, but not mention the flags are not required after 2016 or using locally attached SSD. On the obverse I was quick to adopt Policy Management, but improvements have been slow to come through.

  • Love your article and agree completely. I’ve been scouring the interwebs and can’t find good “full scope” examples for how to use Extended Events. I have found several that list scripts to create, turn off and on and then remove the sessions, but I can’t find anything that shows how to view those sessions after they’re running. Do you have any suggestions or know of any good examples for how to read the sessions once they’re created?

  • Magnus Ahlkvist

    I got a new manager many years ago now. I have since changed jobs several times, but what he said on the first team meeting stuck with me:
    “Hey guys, I want to get to know you, what you do and what you’d love to do. Let me start this meeting with a question: ‘How many of you guys would like to drive a taxi instead of being a programmer?'”
    Everyone was of course wondering what this geezer was talking about. Driving a taxi? No, we were all programmers or architects. Then he continued: “As your manager, I can only provide a certain amount of training for you. The budget for training will never be big enough for you to learn all the things you should want to learn. You have to put in work on your own time as well. If you don’t, well you’re tomorrow’s taxi drivers.”
    It was a pretty harsh way to start our relation, but it was also true. Back then, I was developing COM components and classic ASP web sites. There’s no way I would have learned .NET programming without spending weekends and evenings learning. When I started specializing in SQL Server, same thing.
    So yeah, I agree with every word in this article. It’s hard and sometimes I wonder for how long I’ll keep up. What used to be “SQL Server database engine” is now on-premise and cloud data platform. I have to prioritize, because I don’t want to become a generalist who is half bad with a lot of things. And I hope I will prioritize the right things. Otherwise, I’ll have to put in even more work to catch up with the things I thought I could live without. Powershell was for a long time one of these things I thought I could live without, and had to do God knows how many hours of catching up after putting the kids to sleep at night.

  • While I totally agree with you on a main premise of being stuck with an old knowledge, I disagree on the semantics and the reasons. It’s not a common knowledge, but rather a first impression. Unfortunately, most of us don’t have the time to revisit everything that we have tried in the past and it sucked because we are highly impatient. We have been constantly preached that it’s just works is a new norm, so when it doesn’t work, we are quickly moving on. Additionally, Microsoft sometimes have a tendency to kill features that didn’t perform as planned, so in some cases it is not even worth the time.

    • You’re right about the first impression. However, the issue is, the technology moves, but far too many voices are just repeating that first impression, over & over. That’s why I say it’s “common knowledge” because it’s repeated by everyone, even though it’s wrong.

      Nothing I can do about Microsoft dropping technology.

  • Andy Macdonald

    Grant, as soon as I upgraded to SQL 2016, I started converting our warehouse over to clustered columnstores. The performance improvement, not to mention space savings, was astronomical. I can’t recommend them enough, and you’re right, people rely on outdated knowledge to solve today’s problems.

  • Erik Darling

    How much is Red Gate’s monitoring tool using Extended Events to find problems?

    Does it use anything other than Extended Events?

  • Tom

    Actually, “common knowledge” is not a trap. What you describe as “common knowledge” is more frequently known as “actual experience”, which is enormously powerful. When you have learned from actual experience that columnstore indexes suck,
    you would be an idiot to then ignore what you have seen with your own lying eyes, and believe the Microsoft (and affiliated) “advocates” instead. You know they lie. Experience does not. When the columnstore indexes improve, you can decide when to try the new and improved version. But, when the advocates are just saying “You are an idiot to believe your own lying eyes”, that is not persuasive. We know the advocates tend to lie.

    • I’m an advocate (I guess, given I write books under the Microsoft Press imprint and have been awarded an MVP three times). One thing I’ve learned as a speaker, author, and advocate, is that lying is caught out REALLY fast, especially with a large body of work. If I make mistakes, I’ll own up to them. I don’t know why Tom is angry, but hopefully he can find closure elsewhere.

  • Andrew K

    I’m one of the ones glad that columnstore and the associated technologies were released in 2012; limited though it still made a big improvement in our data warehouse queries, and was widely adopted in my workplace.

    There were some tables which we would have liked to convert to columnstore but couldn’t because of “missing” features which were added in later versions, but nobody here has been left with a bad experience of columnstore due to what was available in 2012, very much the opposite.

  • It would be nice if you recorded this and posted a link to the recording:

    SIDE NOTE: Standing invitation, any time I’m at an event, if there’s time, ask me, and I’ll do an impromptu class on Extended Events, why you should be using them, how to use them properly. Just ask.

  • Del

    Our problem, just like for a lot of people is that we are not using the latest and greatest version of SQL server.
    We have just gone live in the last 12 months with a large client and we went live on SQL server 2014. (development did start on SQL server 2012).
    We have no idea when that client will upgrade to a newer version of sql server.
    For our legacy application we support many clients running versions of SQL server from SQL server Express and SQL Server 2008. It is almost impossible to include code that only exists in a newer version of SQL server even if there is a client using that version.

    • When you’re stuck with older versions, of course the problem space is entirely different. You can’t possibly use new functionality on old systems. I understand that. It’s when you get to the new system. Upgrade is over. Pain complete. Off we go, doing old things, old ways, and wondering why we bothered to upgrade since it’s only a little bit better. That’s my concern.

  • Yves

    Hi, very nice article.
    I’ve discovered both columnstore and extended event recently because a lot of our clients just migrate theirs databases without using the new feature (because of the cost) or they just don’t migrate (I’ll migrate soon a database from 2008 to 2019!).
    I totally agree that human have a very strong resistance to any change, and when the first experience is not so good, we don’t want to persist (i.e My first experience with SQL Server partitioning were very bad coming from Oracle).
    As the time available to stay up to date, it’s a big issue! You can’t always work night and week end to be aware of any change if you are using more than 3 or 4 systems, which is quit frequent for a DBA.
    Thanks a lot,
    Yves

    • ScaryDBA

      Honestly, partitioning still stinks.

      I hear you. Keeping up, let alone staying ahead, is a drag, on many levels. I still say it’s worth it for most of us, most of the time. There is a payoff, despite the cost.

  • Marios Philippopoulos

    Enjoyed reading this article. I think a big issue is columnstore indexes have not (yet?) been incorporated as part of the missing-index recommendations that one can get in query plans or the missing-index DMVs. As a result, it is difficult for people to integrate them into a query-tuning methodology. I recently came across being able to leverage the Database-Tuning Advisor as a source of such recommendations. Is there any other way to obtain these recommendations right now in SQL Server 2019?

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.