I’m putting on a pre-conference seminar (also known as a pre-con) at the PASS Summit this year. I’m really honored to be able to present this and I’m pretty excited about it. So, if you want to talk query tuning, let’s get together at the Summit. For a few fun facts about the event, check out this Q&A over at PASS. To register for the event and my pre-con, go here now.
I took part in the PASS Summit 2014 selection committee this year because I was really curious about seeing how the sausage gets made. I’ve seen how actual sausage gets made and I still eat sausage. Despite a few hiccups and communication issues, internal and external, I think the selection process for the Summit went really well this year. But, there was still some controversy. Being a naturally pushy person, I got involved in the controversy, for good or ill, and subsequently have had conversations with many people about the selection process (which, reiterating, I think went extremely well overall). But, the one thing that kept coming up over and over was a simple question:
How come I/PersonX didn’t get picked?
The easy answer is because you/PersonX had a horrible abstract. But you know what, in probably most cases, that’s not true. Good abstracts by good people didn’t get selected, so what the heck? I think the more complex answer does not go back to the selection committee or the selection criteria or the selection process. Do I think some improvements are possible there? Yes, and I’m putting my foot where my mouth is (or something) and joining the committees to try to make some tweaks to the system to make it better (and really, we need tweaks, I want to repeat, risking ad naseum, the process went well and worked great and I’m happy I took part and I think the outcome is pretty darned good). No, the real problem lies elsewhere, SQL Saturdays.
I’m not saying SQL Saturdays are themselves a problem. What I’m saying is that PASS took on the whole SQL Saturday concept for several reasons, one of which was for it to act as a farm team for speakers. This will be my 10th Summit. Looking back to 10 years ago, while I really loved the event, oh good god have the speakers improved. I remember sitting in sessions with people who were mumbling through their presentations so much that, even with a microphone, you couldn’t hear half of what they said. Slide decks that consisted of 8-12 pages of text (yes, worse than Paul Randal’s slides, kidding, don’t hit me Paul). Speakers who really, clearly, didn’t have a clue what they were talking about. It was kind of rocky back then. I learned my second year that you had to talk to people to find out, not just which sessions sounded good, but which speakers were going to present those sessions well enough that it would be worthwhile. Why were there so many weak presenters? Well, because there was almost nothing between speaking at local user groups and speaking at Summit (I made the leap that way). There were a few code camps around, a couple of other major events, various schools and technical courses, and Summit. I don’t know how the old abstract/speaker review process worked (and I apologize to whoever read my first abstract because I know now just how horrific it was and I’m so sorry I wasted your time), but I’m pretty sure they were desperate to get enough submissions that sounded coherent with a speaker attached that probably could get the job done. Not any more.
Now, people are getting lots of opportunities to present at SQL Saturday events all over the world. And it’s working. We’re growing speakers. We’re growing good speakers. Don’t believe me? Then you go to two or three events in a month, sit through 8-12 sessions, mostly by newer people, not Brent Ozar, not Denny Cherry, not Kim Tripp, and you review them, each, individually, then go back and try to pick the best one. Oh yeah, there’s going to be a few dogs in the bunch, but overall, you’re going to find a great bunch of presentations by a great bunch of speakers. Our farm system is working and working well. But there’s a catch.
Because we have upped the bar pretty radically on all the introductory level speakers (and if you’re thinking about presenting, don’t let that slow you down, everyone starts at zero and goes up), that means the competition at the top (and yes, I do consider the Summit the top in many ways, not all, see SQLBits) is becoming and more and more fierce. That means, my abstracts probably need quite a bit more polish than they’re getting (and so do yours) because there are a whole slew of speakers coming up that are writing killer abstracts. That means I need to really be concerned about the evaluations (despite the fact that I get dinged because the stage is low, the room is hot/cold, lunch didn’t have good vegetarian choices, England left the Cup early, all outside my control) because there are new speakers that are knocking it out of the park. In short, you/I/PersonX didn’t get picked because the competition has heated up in a major way.
In short, a sub-section of the community, defined by those who wish to speak, are victims of the success of the farm team system as represented by SQL Saturday. On the one hand, that sucks because I now need to work harder than ever on my abstracts, on the other, we’re going to see very few instances of really bad presentations at Summit. We’ve improved the brand and the community. It’s a good thing.
I am humbled and honored (and more than a little horrified) to be on this list of the Best of PASS Summit 2013. I mean look at those names. Every single one is a person I look up to and respect and learn from constantly. How I made a list like this… well, thanks. I appreciate the support and kindness that was shown at the PASS Summit when you filled out your evals.
Oh, and while I realize intellectually and SQL skill-wise he totally kicks my behind… Neener, neener Conor. You’re in the DBA track and I’m the only one in the top 10 in the Cloud track.
By the gods, I’m going to pay for that, but it’ll be worth it.
Today we have to eat our vegetables and then get lots and lots of sweet desert.
Today we hear about PASS Finances as a part of the official annual meeting and then we get to hear Dr. David Dewitt speak (completely and utterly getting our nerd on and squeeing like teenage girls at a Bieber concert).
I will be live-blogging this event, so watch this space.
8:20: Douglas McDowell kicks off the key note today. the vast majority of the money that runs PASS comes from the Summit. That’s right, by attending the Summit you’re also supporting the organization. The Business Analytics Conference, which kicked off this year also provides quite a bit more money to the organization.
8:25: PASS has changed its budgeting process. At this point, there is about 1 million dollars (American) in the bank. That means they’ve got a cushion should an event go south. That’s very important.
The amount of money spent on the community last year was $7.6 million. 30% of that is focused specifically for international audiences (which begs the question, how much money comes FROM the international audiences). The money is spent on Summit, BA Conferences, Chapters, SQL Saturday, 24 Hours of PASS and 520 web sites (woof).
8:31: Bill Graziano, PASS President, takes the stage to say goodbye to PASS Board members leaving the board. Douglas McDowell, who was just talking, is leaving the board after six years and being a volunteer since 2001. Rob Farley is also leaving the board. Rushabh Mehta comes on stage after eight years on the board. He’s the Immediate Past President, a role that automatically rolls off the board after a couple of years.
Next up, Thomas LaRock, the currrent vice-president of Marketing and the incoming PASS President. We had about 3000 unique viewers online at the live PASS TV (which I launched this morning, talking about KILT DAY!). The new board positions are Adam Jorgensen, Executive Vice President, Denise Mcinerney Vice President Marketing. Jen Stirrup, Tim Ford and Amy Lewis are coming onto the board.
In 1999, the Summit started. That’s 14 years. I’ve made 9 of them in a row.
8:38: PASS Summit 2014 will be in November 4-7 in Seattle next year. The PASS BA Conference will be in San Jose, CA May 7-9 in 2014.
Remember there are tons of networking opportunities.
8:41: What, Why, How Hekaton with Dr. David DeWitt
Let’s get our nerd on.
Dr. DeWitt is one of the things that makes the Summit.
Hekaton is memory optimized but durable, very high performance OLTP engine, fully integrated into SQL Server 2014, Architected for modern CPUs. It really is a wicked cool technology. I still don’t by the concept that you don’t need new hardware for this, but that’s not questioning the utility of the functionality.
OLTP performance has started to plateau with current technology. The increases in CPU just aren’t going fast enough any more, so they have to find something to figure out how to improve performance. The goal for Hekaton was a 100x improvement. They didn’t make that, but they got between 10x and 30x improvement, which is pretty amazing.
You can’t just pin all tables in performance. Latches for shared data structures are going to hurt. they hit locks for control mechanisms and the execution plans generated won’t be improved.
The implications of a buffer pool are that you get storage over time.
You’ll need to track down the slides to understand some of what I’m saying in this live blogging. It won’t make sense without them.
So a query needs a page. It checks for the page. The query gets blocked until the page gets allocated and then it continues from there. But, another query can be blocked by the process coming in. So, they added latches to the pages in the buffer pool. He shows how the latches allow multiple queries to find objects in the pool, but mark them as being used. But this ultimately runs into performance because the shared data structures need latches and they consume time to maintain.
You also have to have concurrency control, in short, locking and blocking (you know, the stuff that NOLOCK “fixes”). Jim Gray, mentor to Dr. DeWitt, came up with two phase locking. So a query gets the lock type from the manager and then when a query releases locks, they can be reused. This basically sets up the idea of serial locking to get things done correctly.
When the database lives on disk, the processing time to get a query and create a plan, can be trivial (not always), but if the data is in memory, that becomes way to expensive.
All this is the reason you can’t pin stuff in memory.
Shared data structures have latches. Concurrency control uses two-phase locking. Query plans are through interpretation.
Hekaton, on the other hand, uses Lock-free data structures, meaning no latches. They’ve stopped using locking for concurrency control. They use versions with timestamps + optimistic concurrency control for Hekaton. And queries are actually, literally, compiled into a DLL. That’s right. COMPILED. Queries have been “compiled” into an interpretation layer all this time. Not literally compiled. But, with this, they’re getting turned into DLLs.
There are now three query engines in SQL Server. Relational, Column Store and Hekaton. These are three distinct stacks. Queries can span all three.
9:06: First, you create a “memory optimized” table. That table does have limits (look them up) in structure and data types supported
Second, populate the table, but, you have to make sure that data will absolutely fit in memory. You can put 5gb of data into a system with 2gb of memory. NO PAGING TO DISK. It’s in-memory, right?
Third, run queries, but there are some language restrictions.
Lock Free Data structures. the data structures are truly rocket science. They make query optimization look simple (OW!). These were invented by Maurice Herlihy at Brown University. It’s not really lock-free, but since it’s not about concurrency, it’s about being latch-free. Dr. DeWitt tells us he could explain it in about 30 minutes, but instead we get a demo.
he’s showing that latches slow down more and more as the number of threads hit the system. Yet the lock-free approach actually increases. Then, when updates occur, everything stops until the update completes. The lock-free mechanism doesn’t stop at all. It doesn’t even slow. The lock-free mechanisms took 5 years alone.
Multi version, optimistic, time-stamped concurrency control: The assumption is that conflicts are rare. Transactions are run to “completion” with no locks. Then conflicts are resolved later. Multiversion means that updates create a new version of the row. Each row version has a time range. Transactions use ther being timestamp too select correct version. Timestamps are used to create a total order for transactions to obtain equivalent of a serial order. This reduces the number of threads and that reduction rediuces the likelihood of locking.
Read committed versions start. Updates create new “tentative” vversions and then the DB tracks the rows read, written and scanned. Then updates go through a pre-commit step which gives you validation and then the concurrency control goes through it’s work in post processing.
Timestamps are just counters. So you get begin and end times so you know how to track mechanisms. End times are always unique and that’s how you can manage who goes first in terms of concerency.
So a row gets tagged with a begginning ts and then when it completes a unique end time time stamp. When it starts, you get a new version of the row, with pointers linking the versions of the row. There will be a “magic identifier” assigned from the transaction to the versions of the row. An end time stamp to the older row and now end at all, but a begginning time stampe on the second row. So, this means no latches were used and there were no locks set and there were no blocks of other transactions. This creates the basis of multiversion concurrency control.
So if you have two transactions running concurrently, you’ll see the first transaction create a version of a row with copies and versions. Then a second transaction tries to read the row. If it’s timestamp of the second version which was earlier than the first transaction, it’ll use the older version, because the time stamp of the end time of the second transaction must be later than the current time, because it’s not complete yet.
Yeah, that sounds confusing, but looking at the slides you’ll get it.
Then, a clean up process has to occur. When the begin time stamp of the oldest version in the system ticks past a more recent version, then the older version will get removed. This clean up is cooperative, non-blocking, incremental, parallel and self-throttling.
Each version contains a valid time stamp range. You get transactions through time stamps and versions. Then a transaction will read only versions of rows that valid when time overalps the beginning of the range for a transaction.
THEN, we have to go through Validation.
1. Transaction obtains a unique end time stamp
2. Determine if the transaction can be safely committted
3. Validation steps depend on isolation level (and check the slides for details).
Each version read is checked to see if they’re still “visible” or “valid” at the end of a transaction. This also helps with phantom avoidance. But, everything is in memory and we’re not getting locks, so while expensive, it’s actually still cheaper than the old versions of latching and locking.
Post-processing goes through three phases. You get a log record with all versions of the row and the primary keys of all deleted rows. A single i/o is written to the log. For all rows in the transaction writeset the transaction id is replaced with the end time stamp.
I sort of understand all this. The trick will be to remember it and then learn how to explain it to others.
But, you have to have checkpoints and recovery. Data is stored in the logs during checkpoint operations, roughly the same as normal. Recovery loads the know checkpoints and scans logs to recover all work since then. It has full integration with High Availability.
Then we got to queries and quer plans. You can run regular queries in what they call interop but you sacrifice performance. Instead, you want to compile it. You get physical plans, kind of the same way as you used to (not quite the same, but I was hitting a snag when he explained that part, check the slides), but then it goes through a translator which generates c code. Evidently, really ugly c code. But then the compilers is called and then you get a DLL. This is 100% totally specific with no functions. Then you get a DLL loaded and invoked. You never recompile that query again.
The number of instructions is interesting. A classic table can take 700 instructions to find a row. With Hekaton, 332 and with a native qp, 75. Even if you don’t find a row, it’s 300, 110 and 31.
Interop can get up to 3x improvement, but there are still language limits. Same issues with native mode, but you 10-30x improvements with that.
Finally, he’s going through a whole bunch of performance improvements by various, REAL, companies using it now.
The whole thing is that memory prices have been declining. We’re seeing lots of CPU cores designed for concurrency, but we’re still hurting from CPUs through the lack of compiled code. But, it’s supported by the hardware trends.
I am liveblogging the keynote from the bloggers table at the PASS Summit again this year. Just keep scrolling.
Watching the introduction video as people trickle in. All the other bloggers are setting up. I get in early. I didn’t rearrange the seats this year. I see others doing it now.
8:11: Watching the videos of all the attendees registering and meeting people at the start of the event and last night’s welcome reception is awesome and fun.
8:21: The lights go down and the videos of what everyone is looking forward to at the Summit. In keeping with our location, right next to the NASCAR Hall of Fame, we’ve got a bit of a race theme going on. We’re seeing current PASS President, Bill Graziano having a dream about driving a car. He’s starting off with the list of PASS Board members, just so you know who it is that’s doing the most of the work for this fantastic volunteer run organization.
We’re also getting a listing of the 700K hours of training that have been put together by the PASS Organization.
8:30: We get to find out who the PASSion award winner is. Amy Lewis, who is absolutely an amazing person, is the winner this year. Ryan Adams, sitting right behind me blogging away, was an Honorable Mention Volunteer. Well done Ryan. They also ask you to nominate outstanding volunteers for the year too. Make sure you do. This really is a volunteer run organization, so you need to support the volunteers.
8:37: Quentin Clark takes the stage with a listing of companies that have adopted Azure technologies. If you read my blog, you know it’s one of my passions (although I’m still a query tuning freak). You need to get going with it.
Quentin is starting with the concept of “A story about transformation.” He’s showing how Brick and Mortar and Internet are helping each other, not hurting each other. Integration between the stores and the internet made things better. The comparison is of course aimed at telling the story between on-premises computing and cloud computing. It’s a compelling story. We’re seeing how they’re rolling out a series of software that is available now, or in the very near future, which is different than past key notes where we saw stuff that was coming out “next year” or “real soon.” That’s an awesome approach.
8:49: We’re seeing all kinds of new technology in 2014. They’re not fundamental changes causing a rewrite of technology. Instead they’re additional technologies, updateable column store index and in-memory tables and indexes for OLTP. It’s awesome. It makes it possible to do more, when you need to, rather than only after rewriting your entire app. I think the work they’re doing in Azure is making it possible for them to release more frequently to the on-premises versions without causing breaking changes. It’s a great way to get things done.
8:54: I love the demos when they are more realistic. We can see a 10% improvement on queries, just by using Memory Optimization. They’re also introducing the Native Compilation, which means a true compilation, turning a proc into code on the structure of the SQL Server instance, not simply a query plan stored and accessed in cache. That resulted in another 11X improvement in performance. The issues around this though is that most of this technology is very hardware intensive. You’ll have to have big boxes for this to really help you. So yes, we’re getting great new technology, but you’re only going to be able to really blow it out of the water with other great new technology.
The main points they want to tell us is that it’s built into SQL, which is 100% true. They also want us to know that there is no new hardware needed, which doesn’t make sense. You can’t put stuff in memory without using more memory. It has to impact existing hardware. However, I see the utility of it.
8:59: They’re expanding on the abilities for availability and recovery. Sorry people, but this means taking advantage of additional functionality in Windows Azure. But it works. I’ve seen it in action in production environments. You can set up AlwaysOn secondaries in Windows Azure. You can backup to Windows Azure. It’s not a requirement to migrate your systems out of your environment, but to use the Azure system as your backup and recovery mechanisms.
9:05: They new backup tools are great. From 2014 we get backups that are encrypted without requiring the database to be encrypted. That’s great. They’re also making it so that you get backups that you can automate based on data changes, not just a timing thing. That opens up whole new ways to protect your systems. I’m excited by this stuff. I’m also interested to see that they’re releasing a tool that will let you incorporate backups to the cloud from your 2005, 2008 and 2008 R2 systems, not just 2012 and 2014. That’s great, but it hurts companies like Red Gate that have been offering this as a product to people for years. Ah well.
9:17: Microsoft continues to expand it’s Hadoop offerings supporting it through the desktop and through Azure in HDInsight. Most of this stuff is in preview, but they have people using it in their production environments, so it must be relatively solid. The point is being able to query everything. Not simply this type of query from structured data and this type from nonstructured.
9:28: Mostly talking about BI stuff. I’m glad we’re serving out the data in better and more interesting ways. I just can’t get too excited about it. In the mean time I’m actively configuring a CTP2 of SQL Server 2014 in an Azure VM while the event is going on. People are trying to download it instead of setting up a VM. They’re crazy.
9:45: That was a good key note.
It was pointed out to me that since PASS is such a huge networking event, any employer would be crazy to send a good employee to the event. They’ll just come back and hand in their two weeks notice. You know, that’s entirely possible. But, let’s not confuse networking with job hunting. Funny enough, while I did get my latest job while at the PASS Summit, it wasn’t through the personal network that I had built up over the years of going to, and speaking at, the Summit.
I use that network as an extensive knowledge base. If I have a question about Availability Groups, I have at least three different people I can reach out to. If I get stuck on some internals question, I have other individuals I speak with. I know who to talk to if I get stuck in PowerShell. Think about it. How much more valuable does that make me to an employer? They’re not just hiring me. They’re hiring my network. But that’s only part of how you want to convince your boss to send you to the PASS Summit. Let’s go over a few items that will make it easier for you to convince your employer it absolutely is in their best interest to send you to the PASS Summit.
My Knowledge Base
Maybe this one is obvious, but you should talk to your boss about the addition of more skills to your skill set, an improvement of your overall knowledge and your worth to the company. There is a ton of excellent learning opportunities at the Summit covering the entire length, breadth and depth of SQL Server and it’s attendant products. These sessions are lead by some of the most knowledgeable and skilled people in the industry. Further, they’re practically slavering at the bit to have you ask your difficult question so that they can exercise their skills and expand their knowledge. You can learn more, faster, at the PASS Summit than almost anywhere. That’s going too help your employer because you will be a better employee
Our Current Problem
Just about every year in the 6-8 weeks leading up to the PASS Summit, I would start collecting questions. What particular pain points are we experiencing with the products that I need to grab 10 minutes with a Microsoft engineer to talk about. Oh, didn’t I mention that fact? Yeah, the guys who built the product are frequently at the Summit (although more are there when it’s in Seattle). You can take your immediate problems straight to these people. Further, there’s likely to be an MVP or MCM standing near by who might be able to help out too. Or, you can try the Customer Advisory Team (CAT) who always have a number of representatives there. In short, you can get pretty close to premier support without wasting a premier support ticket.
Our Future Direction
Your company needs to make decisions about future direction. You’ve seen the marketing hype. Now, what do the people who are working with the newest stuff every day have to say? Can you get more information by attending sessions that are not put on by Microsoft on emerging technologies? Yes, frequently, but not always, you can. The PASS Summit is the place to see this. Microsoft doesn’t just develop things and then toss them over the fence to see what works (mostly). Instead, they have companies and individuals working with them all the time to develop new directions for the product. Those people and organizations are frequently at the Summit, displaying new stuff on the vendor floor or giving presentations about the new directions they’re taking the technology. You can get a better understanding if your company’s plans are going to work well going into the future. Even if the plan is best summed up as “We’ll sit on SQL Server 2000 until it rots around our ears.” Others are doing it too. Find out how it’s working out for them.
Our Team Skill Set
Most companies are not going to want to send all of the database development team or DBA team, or development team away for a week. Instead, they’ll send one or two people from each team (maybe less). So your team loses out, right? Wrong. Two things. First, coordinate. Make sure that you cover as many sessions as you possible can. Don’t overlap. When I was working on a team heading to the Summit we would divide up sessions to make sure things got covered that the company needed or that we needed as individuals. While I may want to see speaker X do her session on indexing again, my co-worker has yet to see it, so I’ll send them. And make sure you have a couple of sessions picked for a time period because the session you’re in could be a bad choice. If a session isn’t for you, for any reason, just walk out. Second, teach. You just spent a week getting data dumped into your brain. Teach it to your team. We made a pact that anyone who went off to an event had to present 2-3 sessions to the team from that event. You can even purchase the event DVD and show sessions to your team in meetings.
NOTE: This is not to say, steal these slide decks to become your internal training resource, unattributed to the original presenter. That is a bad thing.
Who do you want to work for? The employer that says, “Heck no you can’t go to the PASS Summit. You’ll meet people and figure out that our company stinks and you’ll try to get a new job, or you’ll learn more and be more valuable and we’re not about to raise your pay.” Or, the employer who says, “Yeah, sure you can go this year. Let’s document what you’re going to learn and how it’ll help the company.” OK, it’s not going to be that easy. You may have to agree not to leave the company for a year or something afterwards. Be cautious about exactly what kind of strings get attached, but also be aware of the fact that the company is investing in you and would probably expect to get something for that investment. Just be sure it’s fair to both you and them.
I get it that some employers are smaller and just can’t foot the bill for this. See if they’ll meet you part way. You pay for the trip and lodging and they pay for the Summit, or vice versa. It can also be about timing. You’ve got a major software release that’s going to prevent you from going. I almost missed a Summit myself because of this. It’s just not always possible, but a good employer will find a way to make it possible, occasionally. If there is literally no support, of any kind, ever, you’re either working for a not-for-profit or, maybe, the wrong company.
I’ll Be On Call
Be on call. Carry the laptop with you. Keep your phone charged (ABC = Always be charging). Don’t enjoy the evening festivities too much (and yes, there are parties at the PASS Summit). Be a responsible employee. I’ve had to walk out of great sessions because of calls from the office. I missed half a day because of a failed deployment. But I was online and available, not falling off the face of the planet just because I was at the Summit. Make the commitment to be available as needed by your employer.
Take lots and lots and lots of notes. You can type them into OneNote or EverNote or whatever. Or you can scribble them into your tablet or onto notepads. Anything that works. But write stuff down. Write lots of stuff down. Write down what you’re thinking about the information as well as details said by the speaker that may not be visible on slides or in code. Write down what you talked about with that lady from that vendor on the back of their card. Take notes while talking to the Microsoft engineer or CAT member. Then, turn the notes over to your employer. They act as an additional knowledge base about the event. It’s one more resource that you’re bringing back to your team.
Bring home a t-shirt or two for those people who couldn’t go. If there’s a particularly cool piece of swag, give it to the boss or have it as a raffle at the team training event for the best question. Share the stuff you get as well as the information you get. A friend of mine and I once collected 56 t-shirts and a stack of other swag (and had a heck of a time getting it all back on the plane) which we then spent almost two weeks handing out in the office to our team, development teams, managers and systems people, etc. It made us look good and cost us nothing but a little time on the vendor floor. It’s silly, but it works. If nothing else, it shows the boss that you’re thinking about your team and the company while you’re away.
I talked about it at the beginning of this blog post. Network. That means not being “that person.” That person is the one who comes to the event, shows up for all the sessions, doesn’t ask questions or talk to a single person all day, then leaves and goes to their hotel room (and then usually goes home saying “Wow, that was a waste of my time”). There are large numbers of opportunities to network. Waiting in line to register, turn and talk to someone. Ask questions of the presenter during their session AND follow-up afterwards (although, let them get unplugged and out of the way of the next speaker). Go to the vendor floor where you should talk to the vendors as well as others. Attend the First-Timers event. Go to the Birds of a Feather lunch. Wear a kilt on Day 2 of the Summit (SQL Kilt Day, you’re reading the words of the founder of the event). Attend the Women in Technology Luncheon. Look up and track down all the places where people are getting together and talking. Go to them. Get together. Talk.
I’m an introvert (people laugh when I say it, but it’s true). I recharge with alone time, not at parties. I get it. But the PASS Summit is not recharge time. If you’re not almost literally crawling into the venue on Friday, you’re doing it wrong. The flight home should be the most relaxing plane flight you’ve ever had because you’ll pass out before take-off and wake up when the wheels touch down.
Take the time and trouble to begin to build your network. And remember, a network is not a series of authors or MCMs or MVPs that you can call. It’s a collection of people, some may be presenters/authors/etc., but the best are probably doing the same job you do but for a different organization. Talk to everyone. Build that network.
And that’s all I’ve got. Here is a different view from the PASS organization and another from Steve Jones. Yes, the learning and the networking should be enough for any employer, but these things aren’t always immediately valuable. So, try out some of the other strategies and approaches I mentioned. Explain to the boss this is what you’ll be doing. Come up with a written plan. Then execute that plan at the Summit. Your career is in your hands. You have to decide how and where you’re going to expand it. The PASS Summit is a unique opportunity to do just that, but you may need to convince the boss.
You know you want to at least take a look at the new Client Technology Preview (CTP) of SQL Server 2014. I don’t blame you either. I want to spend hours swimming through it too. But, you’re thinking to yourself, “Heck, I’d have to download the silly thing, provision a new VM, walk through the install… Nah. Too much work.” I don’t blame you. I found myself on the road the day the software was released, so I was going to attempt to do all that work on a hotel wireless system. In short, I was going to have to wait, no options. Or were there? Actually, there is a much easier option. Azure Virtual Machines.
And no, it’s not that I can simply get a Windows Azure VM ready to go faster than I can a local one (and, depending on just how I set up and maintain my local servers, that might be true). No, it’s that I can immediately get a copy of SQL Server 2014, no download required. It’s that I can, within about five (5) minutes have a server up and running with SQL Server 2014 installed and ready to go. How? Microsoft maintains a gallery of images for quick setups of Azure Virtual Machines. A couple of those images include SQL Server 2014.
To get started on this, and not pay a penny, you need to make sure that you pass the MSDN permissions listed at that link. I know that some people won’t, and I’m sorry. However, get your MSDN subscription set up and link it to an Azure account, then you’re ready to go. Throughout this post, I’ll refer to paying for Azure, if you’re running through MSDN, just insert, “using up my credits” for “paying” and it should all make sense.
First, click on the Virtual Machines icon.
Clicking on the New button gives you options. Reading the screen you can tell that you have a list of different services that you can add; Compute, Data Services, App Services, Networks and Store. By default, if you’ve opened this listing from the VM list, you’re going to already have Compute selected. That provides a second list of options; Web Site, Virtual Machine, Mobile Service and Cloud Service. Again, if you’ve opened these options from the VM list you’re going to have the Virtual Machine selected. If not, make sure that is what gets selected. The final two options you have are Quick Create and From Gallery. For our purposes we’re going to use the Gallery, but let me first tell you what the difference here is. Your licenses for SQL Server, Windows Server, and most Microsoft products (so far as I know) are transferable between Azure and your on-premises machines. This means you can create an empty virtual machine on Azure and then load your software on to it. You don’t pay additional licensing fees. But, you can also use the images on the Gallery. Here you can set up a VM for whatever is listed and you get those machines and their software for additional cost, but no additional license required. In short, you can pay a little bit more to get access to SQL Server or what have you without having to buy an additional license. It’s a great deal.
Worry about paying for it all later. We’re going to click on the From Gallery selection. This opens up a new window showing all the different possibilities you have for your VMs. You can install anything from Ubuntu to Sharepoint to several different flavors of SQL Server. You can even add your own HyperV images to this listing (although that does mean paying for licensing on any created VMs). Scroll down until you see SQL Server 2014 CTP1. On my listing currently, there are two copies. One that runs on Wndows Server 2012 and one that runs on Windows Server 2012 R2. If you want a Start button on your screen, pick the second one. You’ll then be walked through the wizard to get this thing created. Click on the right arrow at the bottom of the screen after selecting a VM.
Now you need to supply a machine name. It needs to unique within your account. You’ll also have to pick the size of machine you want. This, and the size of the data you store, is what you pay for. You’ll need to decide how you want to test 2014, small or large. For my simple purposes, exploring 2014, I’m going with Medium. That currently means 2 cores and 3.5gb of memory. You can go all the way up to 8 cores and 56gb of memory, but you will be paying for that, just so we’re clear. You also have to create a user and password for the system. Strict password rules are enforced, so you’ll need a special character and a number in addition to your string.
You need to configure how this machine will behave on the network. You need to supply it with a DNS name, your storage account, and your region. I would strongly recommend making sure that your servers and your storage are all configured for exactly the same region. Otherwise, you pay extra for that extra processing power. Also, you may see somewhat slower performance.
Finally you have to, if you want to, add this server to an Availability Group. For our test purposes we’ll just leave that set to None. But, you can make this a part of an AG in Azure or with a mixed hybrid approach as an async secondary with your on-premises servers. Oh yes, the capabilities are pretty slick. I would suggest also leaving PowerShell remoting enabled so that you can take advantage of all that will offer to you in terms of managing your VMs and the processes running within them.
Click on the check mark and you’re done. You’ll go back to the VM window and at the bottom of the screen you’ll see a little green icon indicating activity. It will take about five minutes for your VM to complete. While it’s running, you can, if you choose, watch the process, but it’s a bit like watching paint dry. You’ll see the steps it takes to create your machine and provision it with the OS and SQL Server version you chose.
Once it’s completed, you’ll have a VM with a single disk, ready to go. But, you need to connect to it. Remember that user name and password? We’re going to use that to create a Remote Desktop connection to the server. When the process is completed, the server will be in a Running state. Click on that server in the Management Portal and click on the Dashboard selection at the top of the screen. This will show you some performance metrics about the machine and, at the bottom, give you some control over what is happening. The main thing we’re looking for is the Connect button.
Click on that button. You will download an RDP file from the Azure server. Open that file (and yes, your system may give you security warnings, click past them) and you’ll arrive at a login screen, configured for your Azure account. That’s not what you want. Instead, you’re going to click on “Use another account.” Then, in that window type in your machine name and user name along with the password. Once you click OK, you’ll be in an RDP session on your SQL Server 2014 CTP1 VM. Have fun!
Remember, you can stop the VM when you’re not using and you stop paying for it (or, using up your MSDN credits). Just go to the dashboard and use the “Shut Down” option at the bottom of your screen.
If you found this useful and you’d like to learn a lot more about the capabilities of using Azure within your environment, I’d like to recommend you sign up for my all day pre-conference seminar at PASS 2013 in Charlotte. I’ll cover this sort of thing and one heck of a lot more about the future of being a DBA working in the hybrid environment of Azure and on-premises servers.
No, this isn’t some complaint about PASS or the Summit. This is an announcement that not only will I be speaking at the PASS Summit, but I’m speaking about Azure… a lot.
First up, I’m going to be doing an all-day pre-conference seminar specifically aimed at getting you into Azure. No, I don’t want you to drop your on-premise databases and infrastructure. Are you nuts? No, wait, you think I am. OK, fair point. But what I actually want you to realize is that some pieces of your work are better done in the cloud. There are all kinds of terribly fun and cool things you can get done there, as an addition to your existing infrastructure. There are way, way too many things that are better done locally to ever think you’d move it all away from the iron that you control. No, I want you to learn how to be a better DBA/database developer/developer/architect. That’s what this session is all about. The title is: Thriving as a DBA in the World of Cloud & On-Premise Data.
Then, in a spotlight session, I’ll show how to get into query tuning when you’re working with Windows Azure SQL Database. Things there are just a little different. But, here’s the coolest thing, query tuning is more important than ever. You can literally save your company money by tuning queries. This session is: Query Performance Tuning for Azure SQL Database.
In addition to that, I’m going to co-present with Dandy Weyn in a regular session to try, again, to show you that the hybrid approach is absolutely the best choice for the future. Dandy and I will be having a good time presenting: Being the DBA of the future – a world of on-premise and cloud.
Finally, I’m taking part in a professional development session all about working from home. As cool and wonderful as it sounds, it’s actually not that easy to do and do well. Not only do you have to keep your boss happy and deliver what’s needed to keep your job, but you have to keep your spouse and family happy. You might also want to try to keep yourself (mostly) sane. Best of all, I’m part of a great team presenting this: Thomas LaRock, Karen Lopez, Steve Jones, Erin Stellato, Kevin Kline, Andy Leonard, Aaron Bertrand
Check out the full listing of sessions for the PASS Summit. It’s going to be a fantastic event. And, we’re going to be on the East Coast this time, so if you’ve been hesitant to go in the past because of the cost & time to travel, this is your opportunity to go in your own back yard.
Two quick points, I’m putting this blog together using the Surface.. ooh… and this isn’t a keynote, but a spotlight session at the Summit. Still, I thought I would live blog my thoughts because I’ve done it for every time Dr. Dewitt has spoken at the Summit.
Right off, he has a slide with a little brain character representing himself.
But, we’re talking PolyBase, and futures. This is basically a way to combine hadoop unstructured nosql data with structured storage within SQL Server. Mostly this is within the new Parallel Datawarehouse. But it’s coming to all of SQL Server, so we need to learn this. The information ties directly back to what was presented at yesterday’s keynote.
HDFS is the file system. On top of that a framework for executing distributed fault-tolerant algorithms. Hive & Pig are the SQL languages. Sqooop is the package for moving data and Dr Dewitt says it’s awful and he’s going to tell us why.
HDFS was based on a google file system. It supports 1000s of nodes and it assumes hardware failure. It’s aimed at small numbers of large files. Write once, read multiple times. The limitations on it are caused by the replication of the files which makes querying the information from a datawarehouse more difficult. He covers all the types of nodes that manage HDFS.
MapReduce is used as a framework for accessing the data. It splits the big problem into several small problems. It puts the work out into the nodes. That’s Map, Then the partial results from all the nodes is combined back together through Reduce. MapReduce uses a master, JobTracker and slaves, multiple TaskTrackers.
Hive, a datawarehouse solution for Hadoop. Supports SQL-like queries.It has somewhat performant queries. By somewhat he says that the PDW is 10 times faster.
Sqoop is the library and framework for moving data between HDFS and a relational DBMS. It seriealizes access to hadoop. That’s the purpose of PolyBase to get parallel execution access all the Hadoop hdfs. Sqoop breaks up a query through Map process. Then Sqooop runs two queries a count, and then reworks the query into a pretty scary query including an ORDER BY statement. This causes multiple scans against the tables.
Dr. Dewitt talks through the choices for figuring out how to put together the two data sets, structured and unstructured. The approach taken by Polybase is to work directly into HDFS, ignoring where the nodes are stored. Because it’s all going through their own code, they’re also setting up to text and other data streams.
They’re parallelizing access to HDFS and supporting multiple file types. Further, putting “structure” on “unstructured data”
By the way, I’m trying to capture some of this information, but I have to pay attention. This is great stuff.
How the DMS,the stuff used by Microsoft to manage the jump between HDFS and SQL Server is just flat out complicated. But the goal was to address the issues above and it does it.
He’s showing the direction that they’re heading in. You can create nodes and objects within the nodes through sql-like syntax. Same thing with the queries. They’ll be using the PDW optimizer. Phase 2 modifies the methods used.
I’m frankly having a little trouble keeping up.
It’s pretty clear that the PDW in combination with the HDFS allows for throwing lots and lots of machines at the problem. If I was in the situation of needing to collect & process seriously huge data, I’d be checking this out. The concepts are to use MapReduce directly, but without requiring the user to do that work, but instead using TSQL. It’s seriously slick.
By the way, this is also making yesterday’s keynote more exciting. That did get a bad rap yesterday, but I’m convinced it was a great presentation spoiled by some weak presentation skills.
All the work in Phase 1 is done on PDW. Phase 2 moves the work, optionally, to HDFS directly, but still allows for that to be through a query.
Dr. Dewitt’s explanation of how the queries are moved in and out of PDW and HDFS are almost understandable, not because he[s not explaining it well, but because I’m not understanding it well. But seeing how the structures are logically handling the information does make me more comfortable with what’s going on over there in HDFS.
I’m starting to wonder if I’m making buggy whips and this is an automobile driving by. The problem is, how on earth do you get your hands on PDW to start learning this?
Welcome to Day 2 of the PASS Summit!
It’s been a very exciting event so far. Today I’m presenting two sessions, one on tuning queries by fixing bad parameter sniffing and one on reading execution plans. Please stop by, or watch the one on execution plans on TV as PASS is livestreaming events all day long on SQL TV (which is what I used to call Profiler).
The intro video, which can be good or goofy was really good this year. They had people from all over the world talking in their native language, making the point that the PASS organization is a global community. It really is.
Doug McDowell is giving us the finance and governance information for the PASS organization. I find this boring and vital at the same time. We need to know how this organization is managed, if we care about the organization. And since, let’s be honest, this organization has changed many of our lives for the better. I mean through the family we’ve met, the jobs we’ve gained, and just the knowledge that has been shared with us. PASS has doubled it’s expenses in two years in order to support all the stuff they do, SQL Saturday, Rally, 24 Hours of PASS, etc. It’s amazing.
We have three new board members, Wendy Pastrick, James Rowland-Jones and Sri Sridharan. Congrats guys. You’re crazy for taking part, but thanks for everything you do.
Next up is Tom LaRock, another board member and a good friend. The PASSion awards are great. It’s the people who are doing, crazy sick work for the community. Mention goes to Amy Lewis and Jesus Gil. But the award went to Jen Stirrup. Well deserved. She is so active and so passionate. It’s amazing. It’s a well deserved win for her. Congrats Jen and thanks for all you do.
PASS Board members are gathering feedback from the community. If you have an idea, talk to a board member.
Don’t forget to attend the Women in Technology Luncheon. Men and women can attend.
Quentin Clark is now up for the Microsoft part of the keynote. We’re seeing a bunch of people talk about how great SQL SErver 2012 is. It really is great. He’s taking off on the concept of the data lifecycle. That’s a pretty interesting topic. He’s talking about how big data is getting both really, really cool and absolutely frightening. Hotels tracking guests within their building, coupons & ads based on the person standing in the supermarket, things like that. People are actually to the point where we can do things like this. It’s really cool. But wow, that is going to build out some seriously large data sets. The idea is to make gathering, interpreting, and sharing data easy, simple and very, very fast.
We’re starting off with data management. The combination between SQL SErver and Hadoop is pretty slick. It’s PolyBase, the new technology announced yesterday. But, please, presenters, don’t leave teeny tiny fonts up on screen while you talk. Zoom in. The room can’t see it. However, that information was very interesting. I like seeing how you can put these things together. Next up is discovering and refining data. We’re going straight into Excel. That’s the bad news. The good news, Access is dieing. YAY!
So the demo was poorly delivered, but very well structured. We got a good idea of how exactly we can do this with the new technology. There are lots of setup in the management area and in Excel to prep for what they’re calling the ‘Ah ha’ moment. In other words, this is making your data more and more available, but the work to set it up is absolutely non-trivial. The structures get built out in really interesting ways, especially all the model work you’ll be doing in SSAS in order to prep this data. They’re showing how Azure marketplace hooks in. Once all of it is put together, an incredibly difficult task, you can really poke at the data with these new tools. It’s exciting stuff. It’s a shame that the presenters sucked all the life out of it.