Nov 09 2012

PASS Summit Day 3: Dr. David Dewitt

Two quick points, I’m putting this blog together using the Surface.. ooh… and this isn’t a keynote, but a spotlight session at the Summit. Still, I thought I would live blog my thoughts because I’ve done it for every time Dr. Dewitt has spoken at the Summit.

Right off, he has a slide with a little brain character representing himself.

But, we’re talking PolyBase, and futures. This is basically a way to combine hadoop unstructured nosql data with structured storage within SQL Server. Mostly this is within the new Parallel Datawarehouse. But it’s coming to all of SQL Server, so we need to learn this. The information ties directly back to what was presented at yesterday’s keynote.

HDFS is the file system. On top of that a framework for executing distributed fault-tolerant algorithms. Hive & Pig are the SQL languages. Sqooop is the package for moving data and Dr Dewitt says it’s awful and he’s going to tell us why.

HDFS was based on a google file system. It supports 1000s of nodes and it assumes hardware failure. It’s aimed at small numbers of large files. Write once, read multiple times. The limitations on it are caused by the replication of the files which makes querying the information from a datawarehouse more difficult. He covers all the types of nodes that manage HDFS.

MapReduce is used as a framework for accessing the data. It splits the big problem into several small problems. It puts the work out into the nodes. That’s Map, Then the partial results from all the nodes is combined back together through Reduce. MapReduce uses a master, JobTracker and slaves, multiple TaskTrackers.

Hive, a datawarehouse solution for Hadoop. Supports SQL-like queries.It has somewhat performant queries. By somewhat he says that the PDW is 10 times faster.

Sqoop is the library and framework for moving data between HDFS and a relational DBMS. It seriealizes access to hadoop. That’s the purpose of PolyBase to get parallel execution access all the Hadoop hdfs. Sqoop breaks up a query through Map process. Then Sqooop runs two queries a count, and then reworks the query into a pretty scary query including an ORDER BY statement. This causes multiple scans against the tables.

Dr. Dewitt talks through the choices for figuring out how to put together the two data sets, structured and unstructured. The approach taken by Polybase is to work directly into HDFS, ignoring where the nodes are stored. Because it’s all going through their own code, they’re also setting up to text and other data streams.

They’re parallelizing access to HDFS and supporting multiple file types. Further, putting “structure” on “unstructured data”

By the way, I’m trying to capture some of this information, but I have to pay attention. This is great stuff.

How the DMS,the stuff used by Microsoft to manage the jump between HDFS and SQL Server is just flat out complicated. But the goal was to address the issues above and it does it.

He’s showing the direction that they’re heading in. You can create nodes and objects within the nodes through sql-like syntax. Same thing with the queries. They’ll be using the PDW optimizer. Phase 2 modifies the methods used.

I’m frankly having a little trouble keeping up.

It’s pretty clear that the PDW in combination with the HDFS allows for throwing lots and lots of machines at the problem. If I was in the situation of needing to collect & process seriously huge data, I’d be checking this out. The concepts are to use MapReduce directly, but without requiring the user to do that work, but instead using TSQL. It’s seriously slick.

By the way, this is also making yesterday’s keynote more exciting. That did get a bad rap yesterday, but I’m convinced it was a great presentation spoiled by some weak presentation skills.

All the work in Phase 1 is done on PDW. Phase 2 moves the work, optionally, to HDFS directly, but still allows for that to be through a query.

Dr. Dewitt’s explanation of how the queries are moved in and out of PDW and HDFS are almost understandable, not because he[s not explaining it well, but because I’m not understanding it well. But seeing how the structures are logically handling the information does make me more comfortable with what’s going on over there in HDFS.

I’m starting to wonder if I’m making buggy whips and this is an automobile driving by. The problem is, how on earth do you get your hands on PDW to start learning this?

 

 

Nov 08 2012

PASS Summit 2012 Day 2: Keynote

Welcome to Day 2 of the PASS Summit!

It’s been a very exciting event so far. Today I’m presenting two sessions, one on tuning queries by fixing bad parameter sniffing and one on reading execution plans. Please stop by, or watch the one on execution plans on TV as PASS is livestreaming events all day long on SQL TV (which is what I used to call Profiler).

The intro video, which can be good or goofy was really good this year. They had people from all over the world talking in their native language, making the point that the PASS organization is a global community. It really is.

Doug McDowell is giving us the finance and governance information for the PASS organization. I find this boring and vital at the same time. We need to know how this organization is managed, if we care about the organization. And since, let’s be honest, this organization has changed many of our lives for the better. I mean through the family we’ve met, the jobs we’ve gained, and just the knowledge that has been shared with us. PASS has doubled it’s expenses in two years in order to support all the stuff they do, SQL Saturday, Rally, 24 Hours of PASS, etc. It’s amazing.

We have three new board members, Wendy Pastrick, James Rowland-Jones and  Sri Sridharan. Congrats guys. You’re crazy for taking part, but thanks for everything you do.

Next up is Tom LaRock, another board member and a good friend. The PASSion awards are great. It’s the people who are doing, crazy sick work for the community. Mention goes to Amy Lewis and Jesus Gil. But the award went to Jen Stirrup. Well deserved. She is so active and so passionate. It’s amazing. It’s a well deserved win for her. Congrats Jen and thanks for all you do.

PASS Board members are gathering feedback from the community. If you have an idea, talk to a board member.

Don’t forget to attend the Women in Technology Luncheon. Men and women can attend.

Quentin Clark is now up for the Microsoft part of the keynote. We’re seeing a bunch of people talk about how great SQL SErver 2012 is. It really is great. He’s taking off on the concept of the data lifecycle. That’s a pretty interesting topic. He’s talking about how big data is getting both really, really cool and absolutely frightening. Hotels tracking guests within their building, coupons & ads based on the person standing in the supermarket, things like that. People are actually to the point where we can do things like this. It’s really cool. But wow, that is going to build out some seriously large data sets. The idea is to make gathering, interpreting, and sharing data easy, simple and very, very fast.

We’re starting off with data management. The combination between SQL SErver and Hadoop is pretty slick. It’s PolyBase, the new technology announced yesterday. But, please, presenters, don’t leave teeny tiny fonts up on screen while you talk. Zoom in. The room can’t see it. However, that information was very interesting. I like seeing how you can put these things together. Next up is discovering and refining data. We’re going straight into Excel. That’s the bad news. The good news, Access is dieing. YAY!

So the demo was poorly delivered, but very well structured. We got a good idea of how exactly we can do this with the new technology. There are lots of setup in the management area and in Excel to prep for  what they’re calling the ‘Ah ha’ moment. In other words, this is making your data more and more available, but the work to set it up is absolutely non-trivial. The structures get built out in really interesting ways, especially all the model work you’ll be doing in SSAS in order to prep this data. They’re showing how Azure marketplace hooks in. Once all of it is put together, an incredibly difficult task, you can really poke at the data with these new tools. It’s exciting stuff. It’s a shame that the presenters sucked all the life out of it.

Nov 08 2012

PASS Summit 2012 Day 2: KILT DAY!

Welcome to the fourth Kilt Day at the SQL PASS Summit. It might be a little silly, but it’s fun. It’s also Women in Technology day with the WIT Luncheon. Guys are invited.

A short word about the bloggers table. Last year we were… a little loud. So this year, we were cautioned… well, more like told to be quiet or they’d take away our toys. I agree with the intent of the message, please keep it down. But the delivery… it hurt PASS at the bloggers table and upset people. As I was reminded last night by a dear, dear friend who I accidently hurt, how you deliver a message is as important as the message you deliver.

But, that’s OK. Let’s learn from our mistakes, grow & move on.

IT”S KILT DAY!

Last night, I attended karaoke with the fine people from PragmaticWorks. Thanks guys for a great event and for letting me in the door.

And did I mention, IT”S KILT DAY!

Nov 07 2012

PASS Summit 2012 Day 1

We’re off and running here at the PASS Summit.

New this year is live streaming all day.

Bill Graziano is introducing the Summit. More importantly, he’s introducing PASS. Further, he’s introducing speakers to everyone. He doesn’t mean just speakers at the summit, but anyone who has spoken at a SQL Saturday or a user group, and it was a scary large group of people. PASS has created a new web site to make it easier to find local Chapters. Track one down. On the one hand, it’s weird that we’re sitting at the PASS Summit and introducing the PASS organization, but I think they’re right to do it. It’s a great organization and I’m always surprised at how many people don’t know about it.

Bill’s big announcement is the all new PASS Business Analytics conference which will take place in Chicago in April 2013. Since more and more people have gotten good at collecting data, but we really do need to work harder on making use of that data.

There are 300 Microsoft engineers this year at the PASS Summit. That’s a serious amount of brain power. No wonder it’s been so warm here in Seattle. That much brain power is going to warm things up considerably. I’m planning on going and talking to these guys. You should too.

Nice work Bill.

Ted Kummert is up for the keynote. He’s showing the team from SQL Server 2012 at their release party. That’s a large group of extremely smart people. I appreciate all they’ve done. SQL Server 2012 is an excellent product. Nice job kids. Don’t get cocky.

The message. Big data. Because let’s face it, we’re getting bigger and bigger data all the time. It’s the harder problem and the sexier problem. Howerver, a lot of us are still working with small data sets and struggling. Don’t forget about us. But, the new In-Memory database that they’re putting out is pretty slick. That’s going to move things very quickly. Plus, as a geek, it gives us more to learn. Wonderful. The new functionality is going to be released with the next version of SQL Server, and it’s going to be a part of the system. That’s pretty cool. Of course, it’ll probably be Enterprise only, but if you need it, it’ll be worth it. This is going to make a big difference in performance tuning. It’ll open up additional opportunities.

Oooh. Management studio looks radically different. He’s working through a web page. Fair warning, it’s not the finished experience, but it’s very interesting that it might be the direction that they’re heading in. It also resembles Windows 8 a little. The demo’s look pretty cool. He’s improved performance 30 times by simply moving everything into memory. IO latching and locks just go away. Performance shoots through the roof. I need to get my hands on that. We all do probably.

See how columnstore works within this type of hardware and software is pretty amazing too. Plus, you can update it in the upcoming release and you can cluster it. We’re moving into a new world people.

But, don’t forget, this requires HUGE hardware, so it won’t be cheap, at all. Plus, it’s not magic. You’ll still be able to completely mess it up. You’ll still be able to write horrible queries or make poor choices in where to apply indexes. TANSTAAFL always applies.

HDInsight is the new non-relational storage engines based on what used to be Hadoop. This is some cool stuff. Plus we’re seeing newer and bigger Parallel Data Warehouse. I love how things are expanding out so quickly.

The most interesting thing I saw was a new UI for managing SQL Server that was web based. It’s pretty slick, but I’m wondering where that’s going to go.

They also introduced a new thing called PolyBase. It shows a common interface to allow queries across Hadoop and Structural data from a single query. That’s going to be a big deal, but… like everything else, it suggests a lowest common denominator approach. Performance? As Brent Ozar tweeted, if you liked linked servers, you’ll love PolyBase. However, since it’s only in the Parallel Data Warehouse, the hardware made just make any problems go away.

I just can’t get excited about PowerPivot. I agree that it’s a cool thing for business people, but I just can’t get into it. My failing. I know. However, the spatial data display within Excel… that’s slick stuff.

But this has been an interesting and exciting keynote. The new technology coming up from Microsoft is really cool. I think we’re getting a lot of new opportunities to do new things with our data.

Nov 05 2012

PASS Summit 2012: Day -3

The Summit proper starts on Wednesday, but the Summit starts at registration. I left a little early from work setting up for SQL in the City: Seattle in order to run up the hill and get to the convention center around the time that it opened. Why? Cause I get to meet my SQL Family for the first time this week. Lots of people are there and it really is like a family reunion. Smiles, hugs, catching up, stories. It’s the best way to launch the event. Not a lot to report, but I just had to share. I love my SQL Family.

Oct 17 2012

SQL In The City: Seattle

If you missed all the great speakers on the five city tour of SQL in the City, don’t despair. Many of the same people will be back at SQL in the City in Seattle. It’s scheduled on Monday before the PASS Summit proper starts, so if you’re looking to get your learn on early and you can’t sign up for a pre-con, this is a great, free, opportunity to pick up some additional instruction. Check out the list of speakers. It’s going to be an event worth attending.

I’ve seen the early drafts of the feedback forms from the prior five events. People really seem to enjoy this slightly different approach. In short, Red Gate puts on a heck of a show.

During the five city tour, I was able to do three different presentations, two focused on improving database development processes and one on picking up some of the more obscure monitoring metrics. Based on the feedback, these went over well.

One of the biggest hits is my, for want of a better term, sales pitch for a sandbox development process. From talking to people and reading through the feedback forms, it seems that large majorities are hitting some pretty common issues while attempting to develop databases. DBAs, well intentioned, and right, though we may be, are, to a degree, standing in the way of developers going as fast as they can during development. In this session I spend an hour outlining how I think we, DBAs, can fix that problem. It may sound like a developer focused session, but I’m really hoping to get a room full of DBAs and get them all convinced to be on the side of developers and the development process in order to help deliver more code faster into our production environments, but do it in a safe and secure manner. Developers are welcome as long as they bring everything they learn back to their DBA team. But it all starts at the sandbox. Come to my session and I promise to explain it in full.

And, of course, this is another chance to meet, talk to, interact with, and have some laughs with your #sqlfamily. If you’re in Seattle anyway, stop by, learn something, talk to someone, have a little fun.

Sep 24 2012

Interviewing a DBA

I’m not a fan of trivia style interview questions. Yes, I ask a few because you have to in order to immediately eliminate the completely unqualified applicants. Even those types of questions, in my opinion, need to be focused on concepts and not syntax. The reason we have the Books Online with SQL Server is because you shouldn’t have to memorize every possible command along with all their parameters. Want to know how to write a MERGE query? Look it up. What does a MERGE query do? That you ought to know. I think concepts are important. Questions about the recovery models within SQL Server aren’t trivia about the system, they’re trying to get to your understanding of how point in time recovery works.

I don’t really like posting interview questions. And most of the time when I’ve seen interview questions posted (even mine), they’re pretty trivial stuff that doesn’t really get to whether or not the person you’re trying to hire is a good fit for the position and your team. I also don’t like posting interview questions because some people will try to use them to study up and attempt to BS their way into a position they frankly don’t deserve and haven’t earned. SQL Server knowledge and experience comes from using it to solve problems out in the world and protecting the information generated by a business.

That’s why I love this question. And I don’t mind sharing it with you because you can’t really memorize an answer to it:

You get a call from one of the business people. They tell you that the database is running slow. What do you do?

This is completely and utterly open-ended. It can go anywhere. In fact, it’s going to go where you lead it. For example, you could say “I first look at the Windows server error logs.” OK, that’s fine (several people I’ve interviewed started there). What indications would you find there that the server is running slow or what would you find there to show why the server is running slow? Suddenly, maybe you don’t want to look at the error logs for the server any more, or maybe you do. But you get the idea. There is no single correct answer here. There are however, lots of very problematic paths, and I’m going to let you go down them. I had one guy insisting that the very first thing he needed to do after the phone call was take a look at the application code to see the method used to make the call to the database. We spent quite a bit of time exploring why this seemed to be the best approach to him. Was it? I’m not saying. No hints on this one. Your answer for this question, is your answer, and that’s why I love it.

Further, as we explore this question, and I’ve spent anywhere from 10 minutes up to an hour working on it as part of an interview, I’m also getting to see how you deal with problematic situations, what your logic chain looks like, what your understanding of SQL Server is, and, most importantly, how you fit into the team. Because with an open-ended question like this, we get to talk. We’re way beyond silly trivia contests now.

Before you think this is unfair to people who aren’t performance experts, fine, let’s talk about what happens when you get an alert that the server is offline. Not a systems person? OK, we just got an alert that a database consistency check failed, now what? See, the point is to go on an adventure where we explore your knowledge and approach. I just have to work hard to make sure we stay somewhat on topic so that I can assess your knowledge and skill level.

Now, if I approach any of these questions and your response is to reject them out of hand, something I’ve run into, then we’re done. I’m not going to focus on trivia, which is how lots of people prep for interviews. I expect you to have concepts, process, logic, and methods available from your time studying and learning. So if we interview, be ready for this exploration, not a trivia contest. And the only way to really prepare is to get experience and knowledge by actually working with SQL Server.

Oh, and sometimes, I ask questions or make statements that are wrong. Sometimes it’s on purpose. Other times, it’s because I screwed up or was ignorant. But you can’t sit there agreeing with me. You better be paying attention because I might be testing you further.

This type of question is just too perfect for understanding how much you know about SQL Server.

Want to start to prepare for answering this kind of question? I’ve got an opportunity for you. At the PASS Summit 2012 this year, I’ll be running an all-day pre-conference seminar called Query Performance Tuning: Start to Finish. In it, I’ll cover quite a bit of what might make it possible for you to answer this question should you be presented it in an interview. No, I’m not guaranteeing you’ll answer it correctly. I’m just offering a chance to prepare. Sign up for the Summit today. There’s still a discount in place that can help you offset the cost of the seminar until the 30th of September.

Sep 21 2012

PASS Elections 2012

Yeah, it’s that time again. And we have a magnificent slate of people running. I mean truly amazing and wonderful people. I personally know each and every one of them. I’ve worked with them or watched them work on projects over the years. PASS has a true embarrassment of riches this time. Which… is problematic for me. I know them all. I respect them all, but I have to pick and choose… I can’t do it. I really can’t. Yeah, I’ll finally vote. I ultimately put pen to paper (or, really, click on some boxes on the screen) and make my mark. I believe in democratic processes and every vote really does count. And if you’re voting, regardless of who you vote for, your vote is not “thrown away.” That only happens when you choose not to vote.

Anyway…

Despite the internal struggle, I am going to call out one person from that list. Allen Kinsel (blog|twitter) has been a good friend for a very long time. Every single person on that slate is going to do a great job if elected. But I’m calling out Allen because he has an unusually strong passion for PASS. He really does. He cares. And I don’t know if Allen is going to do a better job than all the other great people on that slate. But I do know that he’s bringing serious and real passion to the position. Being a rather passion driven person myself, I respect that and feel it needs to be acknowledged. And again, I’m not questioning the other candidates. I couldn’t, can’t, won’t. They’re great people. I’m just acknowledging one trait, in one candidate, that I feel singles him out, a little. And based on the candidates, we really are down to looking for tiny distinctions at this point.

Best of luck making your own choices. But make sure you do make a choice and vote.

Sep 07 2012

SQL Server vs. Oracle

Just so we’re clear, I use SQL Server. I like SQL Server. But, this doesn’t mean I have anything against Oracle. It’s fine. It’s good. But, I know very little about it. However, throughout my career I’ve found myself needing to understand it better. Either because I’m trying to train Oracle people to better use SQL Server and I need to be able to speak a little of their language to facilitate translation. Or, because I’m defending SQL Server on some technical point that the Oracle people don’t completely understand. Or, because I’ve said something stupid about Oracle in my ignorance.

Now, you know how busy you are, and I know how busy I am, so I doubt either of us has the time we really need to learn Oracle much. So, what do you do? Well, Red Gate Software, who straddles the worlds between Oracle & SQL Server like the Bifrost between Midgard & Asgard, has started a series of conversations between two people who know something about each platform, Jonathan Lewis (blog) and me.

We had our first conversation talking about clustered indexes. We covered how they work in both platforms (not that differently) and they’re used and abused. Interestingly enough, according to Jonathan, clustered indexes just aren’t used that much within Oracle, despite the fact that they really do behave mostly the same way as they do within SQL Server, where we use them on most every table (or at least so I maintain you should). It was a great discussion (NOTE: not a fight, no one was nasty or mean, we talked).

We’re going to have another discussion. We’re going to be talking about temporary tables. Again, I don’t know much about Oracle, so please, this is not an attack, but apparently they don’t have the same concept of temporary tables as we do in SQL Server. We’re going to cover a lot of the myths and misperceptions surrounding temp tables on both Oracle and SQL Server, how they work and how they affect performance. I learned a lot during the last conversation and I don’t doubt I’ll learn a lot during this one. If you’re interested, please go to this web page and register.

And, I’d be remiss if I didn’t mention again, if you like learning about performance in SQL Server that you should consider attending the PASS Summit 2012. If you register now, you save $500, which is just enough to pay for my pre-conference seminar, Query Performance Tuning: Start to Finish. I’ll be covering all aspects of performance tuning from gathering metrics to understand which queries are running slow, to reading execution plans to understand why, to addressing the issues to fix the performance and make your queries hum. Please consider taking part. It’ll be a lot of fun and I’ll try like crazy to make it useful.

 

Aug 22 2012

24 Hours of PASS, Fall 2012

It’s time to get your learn on again. The schedule for the Fall 24 Hours of PASS is up and ready for registration. This is the Summit preview session, so many (most, all) of the speakers are showing off some of what you can learn at their sessions at the PASS Summit 2012 itself. It looks like a pretty exciting bunch of topics given by some of the best professionals in the industry.

I’ll be presenting Three Ways to Identify Slow Running Queries on September 20th, 1400 GMT. This is just a sub-set of the information that I’ll be presenting during my all day pre-conference seminar, Query Performance Tuning: Start to Finish. The full seminar I talk about how to measure the performance of your systems, identify which queries are causing you the most trouble, figure out what that trouble is, and show how to fix those queries. This session of 24HOP, I just focus down on three methods you can use, right now, to understand the most costly queries on your servers. If you want to know what to do about them, well, you’ll have to register for the seminar.

A lot of this information is derived from the new edition of my book, SQL Server 2012 Query Performance Tuning. So you can check that out too if you’re so inclined.