Oct 13 2011

PASS Summit Day 2–Key Notes

Bill Graziano has come out on stage, looking marvelous, in a traditional kilt and stockings. Thanks Bill.

For those who don’t know, Day 2 at the Summit every year is Kilt Day.

[8:19]Outstanding volunteers being recognized are Tim Radney and Jack Corbett. These are some outstanding people who work their bottoms off for the PASS Community. If you meet them, thank them.

The 2011 PASSion Award goes to Lori Edwards. She’s simply amazing. Congratulations Lori and thank you for all the work you’ve done!

[8:23]Time to eat our vegetables. We’re looking at the financials. It’s a slightly painful process, but important to understand where the money goes since this is a non-profit organization managed by volunteers. You should understand where the money comes from and where it goes.

[8:25]Quentin Clark is the keynote speaker from Microsoft. We’re seeing a bunch of people talk about the new functions about SQL Server 2012 (nee Denali). There really is a lot of new functionality coming up. Some of it is quite exciting. Some of it is probably edge-case stuff for really big systems. Regardless, there’s lots to learn.

[8:31]SQL Server 2012 is the biggest release ever, especially when you take into account that SQL Azure is part of the common code base. Quentin Clark is going through his “Fantastic 12”. First up is Required 9’s & Protection. Integrations services is a server, they’re introducing HA for StreamInsight and there’s AlwaysOn. We’re getting a testimonial from the Mediterranean Shipping Company. I’m just not a fan of testimonials. Show me demos or teach me stuff, all the rest is marketing.

[8:41]Testimonial done, we’re getting some demonstrations of AlwaysOn. They’re showing how the wizard can be used to build out a true topology of mirroring servers. It really is cool to watch this happen live. I’m going to be spending some time with it myself.

It’s great how they’ve set up a single listener to manage connections so that code is no longer necessary to manage the capabilities.

[8:45]The blogger table got a little rambunctious during the demo because the guy doing it had very small font sizes and didn’t use zoom it at all. When he occasionally used zoomit people started cheering. Big tip, just because the screen is large doesn’t mean people in the back of a large room can see what’s there.

Seriously, the demo was good, but you need to make sure your presentation skills are up to snuff.

[8:48]Blazing fast performance is set up off Performance Enhancements. But the big one is ColumnStore Index. #3 is Rapid Data Exploration, which is some of the stuff from yesterday. #4 is Managed Self-Service BI, in short PowerView and PowerPivot. But there’s also expanded management from Sharepoint and they’re adding Reporting Alerts, which is really huge.

[8:51]#5 is Credible and Consistent Data. They’re working off the BI Semantic Model, cloud data through shared information, mentioned yesterday. They’re also expanding out Data Quality Services and Master Data Services.

[8:53]Lara Rubbelke is showing how to manage the data. She got a quick dig in at using Excel for everything, which was funny if a little subtle. She’s using SharePoint and TSQL. Better still, she used Zoomit and large fonts so not only did we get a great demo, but we could see it. I’m excited about this stuff in ways that I wasn’t before the demo, which is the purpose of these demos. Well done Lara. Thanks.

And yes, there is more cloud stuff in here. We’re going to the cloud people.

This is some snazzy stuff and Lara is making it look great. I’m excited about the concept of business people being able to set up alerts on reports that they work from.

[9:02] #6 is Organizational Compliance. They’ve expaned Audit and added user-defined auditing and filtering. They’ve added User-Defined Server Roles (already wrote that up in my new SQL Server In a Month of Lunches book).

#7 is Piece of Mind. That breaks down to production-simulated Application Testing (WHOOP!). They’re expanding System Center Advisor & Management Packs for SCOM. They’re expanding critical support to a Premier Mission Critical level.

[9:06] #8 is Scalable Data Warehousing. #9 is Fast Time to Solution, in short, appliances. They are releasing an Optimized and pre-tuned appliance. They’re working with vendors to have hardware & software ready for plugging in instantly to your system. Better still, they’re providing your choice of hardware.

They’re working at with, based on the logos, HP & Dell, to create these things. These are very nice ways to get yourself a major system in pretty much no time.

[9:19]#10 is Extend Any Data, Anywhere. They’[re working with PHP, Java & Hadoop as mentioned yesterday. But they’re announcing a LINUX driver in order to convert from “something” to SQL Server (huh, wonder what “something” is, not). Finally they’re expanding file table, 2d spatial and semantic search.

And a demo from Michael Rys. The room just got smarter. Although now, we’re looking at a screen that is unreadable. Same thing with the new app, which sounds cool. Then he zoomed, with a comment “For the Zoomit Fans” and we got to actually see his app working. Huge applause from the bad kids at the bloggers table.

This is pretty neat stuff. I love seeing new code at work. This is slick and very powerful. We’ll put that stuff to work. EXCELLENT demo. Exciting stuff.

[9:28]#11 is Optimized Productivity. IN short, Juneau, or the new SQL Server Data Tools. It’s still Visual Studio… enough said. they’re talking about changes that make things unified across Database & BI. They’re creating a deployment & Targeting Freedom too. The main thing is a new embedded express version that doesn’t require an actual install to have a database ready to go. That sounds great!

#12 is Scale on Demand. AlwaysOn, of course. Deployment across Public & Private combined deployments, and Deployment & Targeting Freedom.

The demo is going to be good. We started with an example of adaptive learning when the presenter came out on stage and said he’d been reading the tweets and immediately used zoomit so we could see the screen. Yay!

We got to see the dacpac at work with Azure and the bacpac (god I hate that name) at work as well. It’s really cool. Oh, and now they’re setting up a method to connect from SSMS to the Azure Storage so you can restore your bacpac files locally. Nice work.

[9:40] And a second Demo! Cool. Cihan Biyikoglu is showing Elastic Scale.

Oooh, just saw an execution plan in Azure. I’m excited.

[9:50] That was actually a really decent key note. I wasn’t bored. I got enough technical information that I’m leaving feeling a bit excited about what’s coming out. Well done guys!

Oct 13 2011

PASS Summit So Far

This is Day 2 of the Summit proper. But for me, this is the fifth day of the Summit and my sixth in Seattle. Sunday was the opening of registration and it was like a high school reunion with people that you really love. Registration itself only takes about three minutes, but I was there for almost two hours talking to people, friends from previous PASS Summits, SQL Saturday’s, SQL Cruise, and SQL In The City.

Monday I put on a pre-conference seminar with Gail Shaw. We had 120 attendees. Despite our worries and multiple contingency plans, we had more than enough material for the time (you try coordinating 7 hours of material with someone from South Africa who has less band width than my phone). It went off wonderfully. Gail and I had a blast.

Tuesday I did some learning of  my own and then attended the new First Timer’s session at the conference. It was fantastic. Much better than last year. If you were part of the First Timer’s program this year, I’ll bet you felt the love. Everyone involved with putting this on, from the Big Sisters/Brothers to the PASS Staff, well done. Don Gabor’s lightening networking was amazing to see. The Quiz Bowl was wonderful, thanks Tim Ford & Louis Davidson. After the reception it was time to go down to the SQL Server Central Party that’s put on every year by Steve Jones and Red Gate Software. It was a little different to work the party instead of attend it.

Wednesday I live-blogged and tweeted the keynote sitting next to one of my favorite people Jen McCown, 1/2 of the Midnight DBA team, and another of my favorite people, Denny Cherry. We heard the official name for Denali will be SQL Server 2012 and we heard a bunch of marketing talk (except for a very short demo from Denny Lee of Microsoft that was interesting).

The big moment for me came at 10:05AM Seattle time when Red Gate Software announced the contest to actually, literally, send a DBA (or data professional) on a rocket, into space. Go to DBAInSpace.com to check it out.

Then I got my learn on. I went to Buck Woody’s cloud session and got some good insights from Buck (and a couple of laughs). Next, I went to see another of my favorite people, Jes Borland, present for the first time ever at the Pass Summit. She absolutely rocked. Then it was my turn to present to 120 people on Dynamic Management Objects. I thought it went well, including dropping in a joke slide for Paul Randal, 55 Ways to Safely Shrink Your Database, which will be on the final recording (lordy I hope they let me talk again at the Summit).

It’s been a great Summit so far.

May 10 2011

PASS Summit 2011 Abstracts

I’ve put in several abstracts for the 2011 Summit. This year we’re voting for preferred sessions. If you’re interested in any of the ones I’ve listed below, please consider giving me a vote.I was very kindly invited to submit for a spotlight session (for which I am very grateful and humbled, again) so I put two in for that. I also put in for two regular sessions. This year, for the first time I put in not one, but two abstracts for all day pre/post-conference sessions. One of them was put together as a partnership between Gail Shaw (blog|twitter) and myself. I’m excited by that one.

I love speaking and I really hope I make the cut.

In the interest of sharing, these are the abstracts I’ve submitted:

Spotlight: DMOs as a Shortcut for Performance Tuning
Dynamic Management Objects(DMO) expose a wealth of information to the database administrator. However, they also expose information that is vital to the database developer. More often than not people gather performance metrics through server side traces, but they don’t have to. This session will show how to gather information from the DMOs for currently executing, and recently executed queries. The session will demonstrate combining this information with other DMOs to get more interesting information such as the query plan and query text. I’ll show where you can get aggregate information for the queries in cache to determine which queries are being frequently accessed or using the most resources. I’ll show how to determine which indexes are being used in your system and which are not. All of this will be focused, not on the DBA, but on the query writer, the developer or database developer that needs information to tune and troubleshoot data access.

Spotlight: Reading An Execution Plan
This presentation will be about execution plans and nothing but execution plans. I will spend the entire session showing you as much as possible about all the information available to you inside execution plans. I’ll show you how to dig into the plans to gather all the data there that tells you what happened with your query. We’ll go to places in the plans that people just don’t think to look at and explore how information from these places informs you about the operations and methods of the optimizer and the storage engine. From this session I want you to learn how to read a plan for yourself. Once that’s in hand, you’ll never need anyone’s help tuning a query again.

Session: SQL Server Backup and Restore for the Accidental DBA
You’ve either volunteered or had the position thrust upon you, but here you are. You’re the DBA. You are being looked to as the person who will protect the companies’ data and you really don’t have a clue where to start. Let me suggest that one of the first things you should do is put together a good plan for backing up your database. This session will focus on the best practices, standards and methods that you can employ to ensure that you have a solid backup process for the databases under your charge. You’ll also learn how to restore these databases, because your backups are only good if you can restore them. We’ll also go over some of the questions you should be asking your business, because data recovery is as much a business decision as a technical one. At the end of the session, you should be able to go back to your office with confidence that you can begin to protect your data.

Session: Creating a Winning Abstract
You’ve decided that you’d like to try out this technical presentation thing, but you’re expected to write this odd little document called an abstract. What the heck is an abstract? This session will attempt to answer that question. It will also provide some methods and best practices for improving your abstracts and possibly improving your chances of getting selected. I’ll be working from failed and accepted abstracts of my own and examples from others, again, both failed and accepted. We’ll talk about what makes an abstract work and what makes an abstract ugly. You’ll be able to take home a few new ideas for building abstracts of your own that can help to get you started making presentations at your local user group, SQL Saturday, and possibly even international events like the PASS Summit. (and yes, if this one doesn’t get selected, it’s the last time I submit it, anywhere, ever)

PreCon: SQL Server Query Performance Tuning: Start to Finish
One of the most common problems encountered in SQL Server is the slow running query. Once a query is identified as running poorly, people frequently don’t understand how to diagnose and fix the problem. This one day seminar focuses exclusively on these two topics, identifying the queries that are performing badly and figuring out how to fix them. We start by learning how to gather performance metrics including both server metrics and query metrics using tools available directly from Microsoft such as performance monitor, DMOs and Profiler. From there we’ll move into learning how the optimizer works and how it uses statistics to determine which indexes and other database objects can be used to assist the performance of a query. The session takes considerable time to show exactly how to generate and read execution plans, the one best mechanism for observing how the optimizer works. We’ll then look at other DMOs that can also assist you when performance tuning queries. With all this knowledge gathered, we’ll move into looking at common performance problems, how they evidence themselves in the metrics and execution plans, and how to address them. Finally, we’ll explore advanced methods for solving some of the more difficult query performance problems introducing such concepts as query hints, plan guides and plan forcing. Through all of this, best practices and common techniques will be reviewed. Attendees will go home with a working knowledge of query performance tuning, a set of methods for identifying poorly performing queries, scripts to assist in these processes and the knowledge of how to fix performance problems in their own systems.

PreCon: All About Execution Plans
The key to understanding how SQL Server is processing your queries is the execution plan. This full day session focuses on the execution plan. We will start right at the beginning and talk about the compile process. We’ll also go over how, and more importantly, why, plans are stored in cache and how they are removed. We’ll spend time exploring the key differences between actual and estimated plans, and why those descriptions are more than a little misleading. We’ll also show you assorted methods to obtain a query’s execution plan and what the differences and tradeoffs of each are. A full day class on execution plans would not be complete without spending time learning to reading them. You’ll learn where to find useful information in execution plans, what the common operators are and how to decipher the sometimes cryptic messages the plans are sending to you. We’ll also debunk some myths surrounding query operators and execution plans. All of this is meant to further your understanding of how queries work in order to improve the queries you’re responsible for. With this in mind, we’ll show how you can use execution plans to tune queries. All of the information presented will be taken from real world examples. We’ll build on the information through the day so that at the end, after following us through multiple examples at your own computer, you’ll have a stronger understanding of how to read, interpret and actually use execution plans in your day-to-day job.

Feb 02 2011

PASS Summit Location

Andy Warren has posted another one of his excellent summaries of what’s going on at the PASS Board. Andy, thanks for what you do. Those of us who care about what goes on at PASS really appreciate your posts.

The discussion under consideration this time is the location of the PASS Summit. As you may be aware, it’s been held in Seattle for several years now and will be there for at least two more years going forward. It seems that the board is leaning, extremely heavily, towards making it a permanent fixture in Seattle.

I can see why they might do this. First, and biggest, it’s next door to Microsoft. That means the Summit gets tons and tons of Microsoft Employees in attendance, which is a huge draw and a very nice benefit. Second, it’s right near the management company’s headquarters, making it less expensive to get the huge staff needed to make a conference this size work. Third, the staff and volunteers are very familiar with the venue (assuming it stays the venue) so it makes planning and execution much easier. It really does make sense. The strongest of these arguments is, of course, the Microsoft presence.

But…

Yeah, there is a but.

There are a lot of people, just in the US, who live in time zones other than the Pacific. In fact, way over half the US population is located in the Central and Eastern time zones. Let’s also add in Europe for consideration. All these places require extra travel time to get over to the Pacific. That’s added expense to individuals or companies, and remember, we’re talking about more than half the population of the US and all of Europe. That’s for the attendees and the speakers (who are attendees too, make no doubt about it) as well as the vendors and their staff, an extra 3-6 hours of travel, which usually means, an entire day on either side of the summit, just spent travelling. Plus an extra day or two in the hotel. Plus extra money spent on food. Let’s also add that this is frequently non-productive or less-productive times for the companies. And don’t forget the stresses and possible costs to the families left at home when all these people are travelling. All that cost is going to add up, and a heavy percentage of 1/2 of the US and all of Europe, might just decide they don’t want to pay all those added costs. Not every year. One year, maybe two, maybe in a release year, who knows, but not every year.

I guess the question is, are more people going to not show up because of cost than the number of people who won’t show up because the SQL CAT (great people, I’ve met several, helpful, smart, useful, I really appreciate them) won’t be there? The board seems to believe that they will lose more people because of a reduced Microsoft presence than they will lose because of cost.

I’m just not so sure. Based on how the economy has been lately, cost must be a huge factor for many, most, companies deciding how many people to send out for training and networking. Is a company less likely to send their people because some developers won’t be available for questions or because they have to pay more to send people?

I’m just not with the board on this. I think the cost is going to hurt attendance more than the added MS presence will help it. Remember, more than half the presenters are not MS employees, they’re MVP’s and others. And, remember, MS will still send a pretty healthy number of employees, just fewer than they would if the event is in Seattle. After all, they want to get in front of you and encourage you to buy and use their products. That’s a big reason why they support the event at all.

I’ve found that asking questions in blog posts usually leads to few, if any, answers, but I’m still going to ask, just to try to understand how far off base my beliefs are, if they are.

Which is more important to you and your company, reduced costs, or more Microsoft people?

Nov 11 2010

PASS Summit 2010, Day 3 Key Note

Today is Dr. Dewitt.

The ballroom, where the keynotes are held, is filled with extra chairs. The Summit organizers expect extra attendance today, and well they should. Dr. Dewitt was amazing last year. I suspect this year will be more of the same.

Rick Heiges is introducing the day (waiting for Dr. Dewitt). Lynda Rab is leaving the board. Sad. I started volunteering for the PASS organization working for Lynda. She’s great. The new board members are Douglas McDowell, Andy Warren and Allen Kinsel.

The spring SQL Rally event was announced. I’ll be presenting a full day session on query performance, Query Performance Tuning, Start to Finish. Look for (a lot) more blog posts on this. The Summit next year has been moved to mid-October. WHOOP! This is great because I was going to miss it next year. Oct 11-15 will be the dates in 2011. Of course, it’ll be at Seattle.

Dr. Dewitt is finally on stage. From this point forward, I’ll be just posting his words & some comments. This is my best attempt to capture the information. There will be typos.

Query optimization is a really hard problem. Dr. Dewitt, says “I’m running out of ideas.” Yeah, right. His “Impress Index” is basically an arrow going down. He’s cracking jokes about his delivery, asking, How Can I Possibly Impress You. He’s showing this strange picture that has 240 seperate colors that each represent an exec plan in the optimizer. We’ll be back to that. This session was voted on. I’m glad optimization won. They live in fear of regression, talking about the optmizer developers.

The 100,000 foot view, magic happens. He’s working off of TPC-H benchmark, query 8. There are 22 million ways of executing this query. The optimizer has to spend a few seconds to pick the correct plan from this full set. It’s still possible to pick bad plans. Cost Based optimization came from System R & a lady named Pat Selinger at IBM. Optimization is the hardest part of building a DBMS, after 30 years. Situation is fruther complicated by advances in hardware and functionality within the DBMS.

The goal of the optimizer is to transform sQL queries into an efficient execution plan. The parser turns out a logical operator gtree, which then goes to the optmizer and a physical operator tree is sent to the execution engine. He’s showing a simple table, based on movie reviews. The query is a SELECT with AVG. Two possible plans. A scan occurs first, then a filter is applied to pull out the right movie and then an aggregate occurs. With this you’ll get a scan, meaning I/O corresponds to the number of pages on the table. Plan 2 uses an index to pull pages from the non-clustered index. This means random disk access that will look up the movies and then pass that on to the aggregate. The optmizer then has to figure out which is faster. The optimizer estimates the cost based on the statistics it has in hand. It has to estimate how many movies there are. So it estimates the selectivity of the predicate, then it calculates the cost of the plans in terms of CPU and I/O time.

So there are equivalence rules, such as select & join operators. Join operators are associative, meaning that the results from multiple tables are associated. Select operator distributes over joins and there are multiple ways of getting back the same information, all evaluated by the optimizer.

With a more complicated query, it could start with seelction of customers, then a selection of reviews, join them together, then join to the movies table and then project out the select out the columns wanted. But with equivalence rules, you can get other plans. Selects distribute over joins rule gets a different plan, or selects commute rule can change the plan. He showed five different plans, then four more plans & said he could have done another 20. For this simple query, he came up with 9 logically equivalent plans. All nine will produce the same data. For each of the 9 plans there is a large number of alternate physical plans that the optimizer can choose.

Assuming the optimizer has three joing strategies, nested loops, sort-merge & hash. He’s also assuming two selection strategies, sequential scan or index scan. Obviously, this is simplified.So, using these three joins & two select methods, there are 36 possible physical alternatives, for one logical plan. So with 9 logical plans there are 9*36 = 324 possible physical plans. And that’s for a VERY simple query.

Selectivity estimation, is the task of estimating how many rows can satisfy a predicate like MoviesId = 932. Plan quality is highly dependent on quality of the estimates that the optimizer makes.

I just sent in a question.

So the Histogram is the distribution of the data within the table. So there isn’t enough space within the db to store detailed statistical info. The solution is histograms. You can different kinds. The equi-widthy histogram divides the rows into equal sized buckets and then figures out how many values match each range of values. So, for an actual value, it might be .059 selectivity, but the estimated value is actually .050. That’s extremely close. But, another value he shows has .011 actual but in the histogram is .082, which is a HUGE error. Hello bad execution plan.

Another approach is equi-height histograms. These divide the ranges so that all buckets contain roughly the same number of rows, as opposed to an equal distribution of values. In equi-height, the second example is .033 instead of .082. Which is pretty good, but still skewed. He’s basically showing that errors can be introduced all over the place. The first example is .167.

Histograms are the critical tool for estimating selectiviy factors for selection predicates. But errors still occur. The deal is, there’s just a limited amount of space for these. other statistics are rows, pages, etc.

Estimating costs the optimzer considers I/O time and CPU time. Actual values are highly dependent on CPU and I/O subsystem on which the query will be run. For a parallel database system, such as PDW, plug, the problem focuses also on network traffic. So back to the two alternative physical plans… You have to determine which plan is cheaper. Assuming that the optimizer gets is right, we know that there are 100 rows out of 100k pages. These are sorted on date, but we’re going for MovieID, random reads. The optimizer doesn’t know system it’s on, but it makes a guess that a scan will take 8 seconds. The Filter will work on .1 microsecond/row & aggregate will be .1micrsec/row, for .00001 seconds, for a total of 9 seconds. Plan two will use the index. Since the rows are sorted on date, random seeks are going to occur. .003 seconds / seek, then  total time .3 seconds and same time for the aggregate. This means plan two is the winner.

But, what if the estimates are wrong. On a log plot, you start to see how, as the number of rows returned, each plan will perform better, based on the rows returned. More will make plan 1 better, but less will make plan 2 better.

That was just to get the data out of a table. To add in JOIN costs, things get worse. First example is to take a sort-merge join. This sorts each data set being returned, and then merges the results through a simple scan. Cost is 5r + 5m for I/O. A nested loop works on scanning one table and row-by-row, scanning the other table. The cost is R + R * M. R is rows M is pages.

With the example, you can see that with an indexin place, highly selective, loop joins can be cheap. But it’s the cardinalities that affect things. So, getting the histogram right is the key trick. With a log plot, again, you see how the various operations vary over time. So for a sort merge, it’s very expensive at a low number of rows, but at a large number of rows, it still returns in about the same amount of time. So as large sets of data are accessed, merge gets good. But at lower numbers of rows, the nested loop works better. So if the cardinality estimate is off, you could get a huge error in performance, especially at the larger sets of data. The optimizer has to pick the right join method. This is based on the number of rows in each set of data being joined.

He then moves on talk about how much space these things take up. The space depends on the “shape” of the query. He shows a type called a “start” join and a type called a “chain” join. Whoa! as you increase tables, the likely numbers of plans increases a lot. I knew this, but I haven’t seen it written down like this. But these shapes are extremes.

Every query optimizer starts off with a left deep plan, first, instead of bushy plans. For the example, a bushy tree would have 645k equivalents for the Star Join as opposed to 10k for left deep plans. With 3 joins methods and n number of joins in a query, there will be 3 to the power of n possible physical plans. Uh… wow. Instead, the optimizer uses dynamic programming. Sometimes heuristics will cause the best plan to be missed.

One method of optimization is Bottom Up. Optimiztion is performed in N passes (if N relations are joined). First pass, find the best 1-relation plan for each relation. Pass 2, find the best way to join the result of each 1-relation plan to another relation to generate all 2-relation plans. Pass n, find the best join result… can’t see it. Gets the lowest cost plans & interesting order rows. In spite of pruning plan space, this approach is still exponential in the # of tables. Costs are done, then pruning occurs. I’ve stopped taking notes on this part. You’ll have to see how this works in the slide deck (I’ll post the location at the end).

So that’s the theory. But the problem is, bad plans can be picked. If the statics are missing or out of date, cardinaltiy estimates are against skewed data, attribute values are correlated, and regression, hardware changes mess stuff up.

Opportunities to improve. Jayan Haritsa, has the Picasso Project. Bing this: Picasso Haritsa. There are actually software there that helps improve values. He’s back to TPC-H Query 8, and using the tool, it will show the plan space for the query, this is the painting of the cool picture at the start of the talk. With this, you can see how sensitive input parameters are to plan generation. So the cardinalities estmates are the key.

This animation shows how the estimated costs for a query start low, peak, and then, instead of continuing up, goes back down. And the optimizer team doesn’t know why. This is his example of how QO is indeed, harder than rocket science.

What can you do better? Well, Indexed Nested Loops looks good, but they’re not stable across the range of selectivity factors. If they went conservative and always picked sort-merge, it would be more stable. So, picking slower operations could make things more stable, just slower. Robustness is tied to the number of plans. And he says the QO team doesn’t understand.

At QO time, have the QO annotate compiled queryu plans with statistics and check operators. Then, you can see how this stuff works. They use this in two ways, a learning optimizer and dynamic reoptimization. The optimizer observed stats go back to a statistics tracker and then, feed that back through to the catalog, and the next query will be better. The dynamic reoptimization takes the idea that actual stats note the estimated stats and when there are differences, truncate the operation, pause the execution, output the query back to tempdb, stores that, and then uses that with the rest of the query to re-optimize using real values. Cool!

Key points: Query optimization is harder than rocket science. Three phases of QO: Enumeration of the logial plan space, enumeration of alternate physical plans selectivity estimates. The QO team of every DB vendor lives in fear of regressions, but it’s going to happen, so cut the optimizer some slack.

“Microsoft Jim Gray Systems Lab” on FaceBook is the source for the slides. Available here.

Nov 10 2010

PASS Summit: Day 2 Keynote

Today is Kilt Day at the PASS Summit. We’re going to try to arrange a group photo at lunch time.

The network connection is extremely slow. I suspect the tweeting about the kilts.

Bill Graziano is leading the key note and he started off with having all the kilted stand. Only about 12-15 of us, but that’s five times better than last year. Then it was time for the volunteers to stand up. It was excellent to see so many people. The Outstanding Volunteer of the Year was Lorie Edwards. The PASSion award went to Wendy Pastrick, who really earned it.

Unfortunately the next segment was on governance… blech! But necessary. Everyone here is a member, so they should know how the money is spent. Luckily Bill is not digging in a lot. He’s covering the things he has to. Yes, it’s a boring topic, but this is a not-for-profit organization and it needs to be transparent. I’ve always been happy to see the numbers, even when it bored the heck out of me.

An X-Box Kinnect was given out to a lucky winner. Cool! I was too busy yesterday to take advantage of the contests… ah well.

Today is also the Women In Technology Luncheon.

The first speaker of the day is Quentin Clark of Microsoft. Mr. Clark is introducing Denali. Today we should get some meat. The goal is shifting user expectations and shifting business expectations. Sadly, I was extremely excited about this presentation, but, instead of getting into the product, we got quite a lot of sales pitch. I do want to see what they think is the most important functionality, but I want to see it, not hear about it. That’s important. I think vendors frequently don’t think about the audience. The Twitter stream started to get pretty abusive, just like last year during the “I can’t mention the major hardware vendor that supports PASS because we really appreciate it” presentation.

Finally, after 40 years in the wilderness, we got a demo of SQL Server Always On. He started right into Management Studio, which is the first time I’ve seen it in the last two days during any of the Denali demos. That’s an indication of something. This is pretty neat. Automatic failover with multiple secondaries, so you can have more than one data center, around the country and have synchronous data in multiple sites. THAT will be useful. This without shared disk. Yes, you can still use it, but now you don’t have to. That’s a huge improvement over what we’ve had in the past. And, he got an ovation during the demo. When you have a collection of nerds as big as this clapping for you, you did something right. Thank you Microsoft. The data synchs occur in near real time, behind the scenes, with HA set ups that you can put together, for individual databases or groups of databases, in about five minutes. Huzzah! Oh, and the secondaries can be set to be readable and you can move your backups to the secondary… WOW! Again, thank you Microsoft.

The break down of the goals is the same as outlined yesterday, of course, Mission Critical, which they just showed, then IT Pro & Developer Productivity and Pervasive Insight. Then Mr. Clark mentioned DAC and there was a low rumble around the blogger table. That is not a popular set of functionality. There’s going to be enhancements in spatial within Denali, modifying the abilities to run queries and moving all the way through the BI stack. We’re finally getting Sequence Generator and Paging and enhanced Error Handling.

FileTable, a whole new integration of FileStream technology is being demo’ed next. This should be good too. The Key Take Away is “Every windows application that generates files, can now store files within sQL Server without a single modification to the app.” I’m not so sure this is a good thing, and what about SharePoint? Still, technology is cool and I’m a geek enough. I’m going to enjoy it. So, to a degree, this works like FileStream, but it’s file management through the database, but, the demo showed a set of files getting inserted into SQL Server management through a command prompt. Oooh… That’s cool. The demo is impressive. You can update the documents from the file system or from the database. That’s pretty neat. I’m just not sure exactly where this goes within the enterprise. I’ll have to read some more about it.

The next set of functionality is Project Juneau. I’ve heard a lot about this. It’s likely to hurt some of the 3rd party tools. We went right to the Demo this time. Thanks. We’re in the VS 2010 Shell now, along with BIDS and everything else. They’re not retiring SSMS, but it’s clear that it’s on the way out, must be. I like the improved TSQL completion. The table designer is good too. Because you can sync the visuals & tsql as you create the table. That’s great! I think I said this yesterday, but there are a lot of people that will not enjoy moving to Visual Studio. I’m a fan, but others will not like it. Still, it looks good. It’s working better than it ever did, and that’s a good thing.

Nov 09 2010

PASS Summit: Day 1 Keynote, Part 3

Ted Kummert is still talking.

For the cloud, of course, they’re talking about SQL Azure. Microsoft really is throwing themselves into the cloud, completely. The emphasis is that they offer both a cloud and an on-premises solution. I don’t mind saying, I’m still trying to get the full business proposition for an old school, fat, business like the one I work for. What should we be doing with the cloud. I just haven’t seen the magic. I see where smaller businesses, or start-ups, or temporary surge capacity for businesses that may have that type of thing can use the cloud, but… traditional work, it just doesn’t seem to jive yet.

We’re going to see some made-up scenarios for how Azure can manage Contoso Bikes. He shows how the report can pull data from the cloud and deploy reports from the cloud, in order to deliver to people on the road. But, we can do that already in other ways. The ability to link your data with the Data Market data is pretty cool. I can see that being useful. You will have to purchase access to these data sets. You can query against them, but, similar to the PDW demo, we’re not in SSMS any more. I wonder what Microsoft’s long term plans are for SSMS based on the ways we’re seeing it being bypassed.

What’s next for SQL Server? Denali. The CTP is getting handed out tomorrow after the keynote tomorrow. We’ll be seeing the demo on Denali tomorrow. The idea that Mr. Kummert is communicating is that Denali represents client requests. They targets are Mission Critical, IT Pro & Developer Productivity, and Pervasive Insight. They’ve focused on manageability and upgrade capacity. That should be good. They’re going to work on performance, which is interesting. They’re unifying the experience into Visual Studio… I’m OK with this, but I know that a LOT of DBAs are not OK with this. It’ll be interesting to see how it breaks out. Denali is the largest release of integration services ever. Full life cycle development on SSIS. That will be good. They’re also talking about expansion on the PowerPivot type of work. Project Crescent is a new reporting tool that is coming out with Denali, which is a new way of showing business information. Sounds good. Finally, a demo. We’re seeing the 100 million row demo, again. I’d like to see the new stuff, please. So, they pulled the data out of Excel and directly into Analysis Services. That’s good. Showing how it’s working within VS, which gives you source control, etc., and then you also get to use the server, which is better than the memory limits within PowerPivot. And he’s showing how over 2 billion. This is a great demo. We’re seeing a trillion rows per minute, filtered & reported on. It’s very slick. This is good. Same technology is also in the database engine. We’re seeing fantastic performance. I might be out of a job. It’s based on the columnar data store technology. It’s a very good thing.

Come back for more tomorrow!

Nov 09 2010

PASS Summit: Day 1 Keynote, Part 2

Mark Souza from the SQL CAT Team, some of the smartest & most capable of MS consultants in SQL Server, is presenting how his team is offering a health check for people’s SQL Server systems.

There going to actually be using some technology to do this little event called SQL PASS It On, using Twitter. Twitter is become more and more of a major part of the event. If you’re not at least monitoring Twitter, you’re missing out.

It’s a busy day with the SQL Clinic, the Exhibit Hall, Community Learning Center, Birds of a Feather Lunch, Regional Mentors, Book Signing and Exhibitor Reception. That’s not mentioning all the sessions.

The key notes will be Ted Kummert today, Quentin Clark tomorrow, and David DeWitt (YAY!) on Thursday where he will talk about Query Optimization. I will be taking notes!

We’re seeing a history of how Microsoft split the code from Sybase for the SQL Server 7.0 release. They built a brand-new database platform in 2.5 years. That’s pretty amazing.

They started off with SQL Server 7.0 for ease of use. Ted Kummert is emphasizing how important Total Cost of Ownership is to Microsoft and their plans. He’s also talking about how important it is that SQL Server is integrated, including Analysis Services and Cloud. His final focus is on large scale, high availability systems. This is the history of what they’ve built. Now, he’s going to focus on the future, starting with mission critical, then covering the cloud, and finally what is going to happen with SQL Server Next.

For mission critical, they’re releasing the Parallel Data Warehouse, which will allow for 100s of terrabytes in what is basically and appliance. That’s right, a toaster for SQL Server. Seriously, this is a big deal. The demo is already fascinating. He’s showing how you create tables with the distribution, and partitioning in place.  But it comes with a special PDW loader, which will load up to 1tb an hour of data. It can even be integrated with SSIS. This is pretty amazing. On the Tweet stream I saw Michelle Ufford mention that she’s looking at it for GoDaddy, so this is viable. They then showed how they could move 800 billion (yes, that is a “b”) rows into the system in 19 seconds. Interesting point from Brent Ozar, what they were doing was not in SSMS. Paulo Resende from Bank of America came out to give a customer testimonial on how they implemented PDW. Now Dave Mariani of Yahoo is giving another testimonial on how they manage User Data & Analytics for… well… spam. They’re running through 1.2 tb a day and 50 gb an hour… uh… WOW! The fascinating thing is, they’re moving that data in a cube for the queries and are able to pull out data in less than 10 seconds. That’s great. Microsoft is also announcing “Atlanta” which is a service that assesses the configuration of your 2008 and 2008 R2 systems, through the cloud. Bob Ward, cool, is out to show how Atlanta works. This is extremely cool stuff. I’d like to think that I keep most of my servers up to date, but a service like this could still be extremely useful.

Nov 09 2010

PASS Summit: Day 1 Keynote, Part 1

Sitting at the big kids table at the PASS Summit, ready to rock and roll. The Summit has not officially started yet, but it’s been a fantastic ride already. I’m getting to meet a bunch of great and amazing people. I made my very first trip out to the Microsoft campus yesterday. Last night was the SQL Server Central party. This is just a great organization and a great event.

Right at the start, the tweeting is hot & heavy. Hmmm… OK, starting off with a Tina Turner impersonator. She’s extremely good, but I have to ask, what were they thinking? Her name is Truly Tina. She was outstanding. Just a bit odd.

Rushabh Mehta is introducing the PASS organization. He’s showing off the Board of directors and the executive committee. He’s also showing what else PASS has besides the Summit, which include 24 Hours of PASS, SQL Saturday and the European Summit. The organization also includes the chapters and the vritual chapters. The organization reaches thousands of people around the world through all these events and organizations. The goal this year is try to get to 250,000 members.

This year the summit has 3807 registrations from 48 countries. The keynote is streaming live, as well as 40 people blogging and tweeting away. If you want to follow the tweets, make sure you use the hash tag #sqlpass. There are 191 speakers with 44 of them MVPs.

Nov 03 2010

Kilt Day

A week from now will be Kilt Day at the PASS Summit. It’s probably way too late to order a kilt at this point. But, don’t despair. You can still take part. Just a short walk from the Summit is the headquarters of Utilikilt. These are not classic tartan wraps with sporans and socks. They’re the modern equivalent, come in fun fabrics & colors and are actually pretty practical. So if you still want to participate in Kilt Day, and we’d love to have you, plan a trip to Utilikilt.

And no, they’re not sponsoring me or anything (more’s the pity). I just like them.