HDInsight and Buggy Whips

Home / Uncategorized / HDInsight and Buggy Whips

I sat the through the big announcements at the 2011 PASS Summit about the partnership between Microsoft and Hadoop. I recognized it as… interesting, but not necessarily earth shattering. I was aware of the NoSQL movement and understood that it answered a pain point that structured data couldn’t really adequately answer, but that was really it. Then I sat through the sessions this year showing the stuff coming out with the Parallel Data Warehouse (PDW) and PolycBAse. Then I watched Dr. Dewitt promise me that I’d be seeing similar functionality within the main SQL Server tool and suddenly… I looked at my structured data knowledge, glanced back at the stuff going on in the keynotes and frankly, they really did look a lot like automobiles.

Quick clarification, I’ve mentioned it several times in my blog, but I’ll repeat. I started in IT a gazillion years ago at the birth of desktop publishing. A wise friend at the time pointed out to me that all the people still doing hot & cold type setting were dinosaurs. They were manufacturing buggy whips as the Model T drove by them. It was true then, and a valuable lesson. I try hard to constantly learn, grow, stretch, develop my knowledge and skill set because I’m quite frankly very scared to be caught working on buggy whips. So, back to HDInsight.

I think there are a number of trends occurring in and around development and databases. One thing seems clear, at least to me. Structured data is going to be around for at least another 10 years, minimum. But, the shiny, that’s occurring elsewhere. That shiny is in places without data, which I frankly have no interest in. It’s also in places with really interesting data integration. Data collected in weird and wondrous ways, which is then transformed into different forms, probably structured data as we (assuming you’re a data nerd like me) are all used to, but also other forms. The things going on within PolyBase as Dr. Dewitt talked about at this last PASS Summit 2012. All the new and interesting ways that OLTP is changing is a great deal of shiny for me. So, I decided to get started. I installed HDInsight (aka Hadoop).

I’m not going to insult your intelligence by talking about the install process. It’s too simple. Download the Web Installer on a supported system and the rest is Next, Next, Install. But, now I’m going to start blogging about this new learning. Yes, I’ll keep blogging about execution plans, query tuning, database design, community, my flirtation with various technologies through the POF and the Surface, and maybe new things such as integrating a Windows Phone into the mix (what can I say, shiny). But, at night, until another book project starts taking up all my spare time (ha! spare time), I’m going to learn about this new method for collecting data and I’m going to share in the experience.

First piece of interesting knowledge, you get nothing as a management tool. When  the install is done, all I can see is Python 2.7. Going to go find some “Getting Started” guides to figure out what the heck to do next.


  • Anne Hills

    Ok. Veeeeeeeerrrrryyy interesting. I will follow your bloggings on this topic closely. If unstructured data is a car, not sure if structured data is a buggy whip or a motorcycle. I’m thinking it’s the bike and will have vital uses right up there with all else. However, if truly buggy whip, the sooner I digest that, the better.

  • Well, it’s not so much the unstructured data, we’ve had that. It’s the upcoming ability to marry the two that I think is the most exciting. But, because of that upcoming functionality, I want to make sure I’ve got at least a decent foundation in managing unstructured data.

    Do a search on PolyBase. That’s the really exciting stuff.

OK, fine, but what do you think?