Jul 08 2014

Worst Day of a DBAs Life

5worst_1_300x250_NEWRed Gate Software is running a campaign around coping with the worst day of a DBAs life. We’ve been posting some really fun stories with, I hope, a kernel of useful information inside each. Chances are, if your DBA career has been like mine, your worst days don’t involve explosions and car chases. But now they’re asking for people to write up stories, what was the worst day in your life as a DBA. I know, I know, first world problems, right? Regardless, I have yet to put a company out of business or kill anyone with any errors I’ve made, although I’ve worked at places where either was possible. But the one day that just stands out, well it started about three weeks ahead of the bad day.

I was working for an internet startup. I was very much still learning the ropes as a DBA, although, I had helped design a system on SQL Server 7.0 that was collecting about 1gb of data a day. Back in those days, that was big data. But, frankly, I wasn’t doing the monitoring correctly. I was doing a lot of manual checks and manual maintenance, stuff I should have taken the time to automate. Live & learn, right? Anyway, because I was taking a lot of time out of each day to do maintenance and run checks on the systems, I wasn’t spending lots of time supporting the development teams. One day, one of the managers came in and said, “No more maintenance. Things should be fine. Spend time on development.” And he was serious. I argued and lost. So I started spending a lot of time doing development and let the maintenance slide. Fast forward three weeks, things had largely been stable, but, I didn’t have monitoring in place, so I wasn’t noticing that we were running a little hot on transactions. The transaction log was bigger than it had been. And then disaster struck. The backup drive filled. I didn’t notice. Transaction log backups started failing. I didn’t have alerts. The log drive filled, all the way, and our 24/7, zero downtime web site went kablooey.

It was at 2 in the afternoon or something, so I, and my fellow DBAs were right there. We immediately tried to backup the log, but it wouldn’t backup. We tried adding a log file. Didn’t work. Then we started getting creative. We tried every possible thing we could think of. Some of them failed quick, so we tried something else. Some of them took hours to fail, making us think we had fixed the problem. It took us 48 hours of failed attempts before we finally admitted defeat, went to the last good backup and restored the database, losing about 12 hours worth of transactions. It was a nightmare.

The good news was, our directive to stop doing maintenance was cleared. We immediately went about setting up alerts and other things so that we wouldn’t get surprised like that ever again. It wasn’t fun, but it was a great learning experience. Plus, all the troubleshooting for 48 hours straight provided excellent camaraderie within the team. That said, I’d rather have avoided the whole thing, and could have with proper monitoring and alerting.

Jul 03 2014

Reflections on the 2014 PASS Summit Selection Process

Oh we are a bunch of high school kids at heart. Maybe high school never ends (and there’s a nightmare, god I hated high school). But, there’s been drama about the 2014 PASS Summit sessions and the Selection Committee’s work. I was on the committee. I worked as a part of the team responsible for rating sessions for the Azure track (said track is gone, more on that later). As self-serving a statement as this is, I think we did a good job. Further, I think the process worked. You can read the official explanation of the process here. Amy did great work and deserves your thanks. All the volunteers who reviewed over 900 submissions from more than 300 people, ON THEIR OWN TIME, FOR FREE, also deserve your thanks. The vitriol directed at the PASS organization over the outcome of this selection process is not directed only at the Board. It’s also directed at the volunteers. And, as a volunteer, that sucks.

The team I worked on rated, I forget, 50 sessions I think. We had to read through them and give them a score based on several criteria. We also had to write comments on each and every session. I was dinged by HQ for not writing a comment on a session that I gave 5′s to on the ratings (so I commented something like “Can’t wait to see this at Summit”). We were only given 10 slots to fill, so that means 40 sessions got kicked to the curb. That’s a lot of people who didn’t get selected. And not getting selected sucks (yes, I do know, I’ve been rejected by a number of events this year, big ones, even ones I’ve spoken at previously, not whining, just pointing out that I don’t have a secret method for getting accepted). Our track actually got eliminated and the sessions that we selected were distributed to other tracks. Also, a couple of sessions we rated highly didn’t do so well when the speaker scores were applied, so there was some shift there (one thing PASS could improve, give us some indication of the secret sauce there, we know there is one, but a little understanding of how it’s applied would help). But over all, the sessions we rated highly, got selected. Congratulations and well done to those speakers. Just look at the people presenting, many for the first time. That’s going to be an absolutely awesome event. And once more, thank the volunteers for doing all that work.

So, some of you are now thinking that, “Oh, Grant’s on the side of PASS” (well, actually, yes, I am, so should you be) “Grant has been told to be nice and play good and not be critical” (even though I’ve already made a criticism about the magic numbers and I was tweeting almost literally threatening messages this week) “Grant got selected so he’s being a <insert bad name here> about the whole thing” (I may or may not be a <insert bad name here> but I don’t agree that I’m being one about this) or, maybe you’re on the other side “OMG! He’s criticizing PASS in any regard, The HUMANITY! Have you at long last sir no decency” (no, not really).

Remember those comments, that I had to write for every abstract, including the great ones? I put a small critical review of the abstract in every one (OK, not the one that I gave 5s to). I said what was wrong with the abstract in my subjective opinion. And let’s be perfectly clear about that (channeling President Obama), they’re my opinions. If I thought you didn’t define the problem space your presentation was meant to address, I said that. You disagree? OK. If I thought your very clever and witty title seriously detracted from the clarity of what the session was about, and it wasn’t that witty, I said that too. You’re the wittiest person you know and everyone says so? OK. My opinion may not jive with yours. But, it’s the one thing I’ve seen everyone who has ever been rejected by the committee ask for, “Tell me what I can do to improve.” OK. I did. At least in my opinion. On every single abstract (except that one).

PASS didn’t release them.

And then, PASS did.

The volunteers (unpaid, remember) did the work, and now it gets to see the light of day.

This brings up a number of points. First, when I got rejected by those other events, did I get a reason for my rejection? Nope. Other events just reject you, thanks for playing. I think PASS, which is all about community, should be different. We should tell people why, not just that there were higher rated sessions, but what they can do to improve. I’ve talked to people in the know, not all the comments provide that kind of information. I think we’ll get better next time. Second, peoples feelings are going to be hurt by these comments. Yes. Yes they are. Suck it up buttercup. You want to know what you can do to improve so you can get selected, but your abstract is absolute perfection (in your opinion), so how dare someone else suggest that it’s not worthy of inclusion, blah, blah, blah, We’re going to see lots of blog posts where people disagree with these comments and that could reflect back in some negative way on the organization. I suppose so, but if we’re going to be about community and we’re going to try to raise up new speakers, we’re going to have to be able to deal with some degree of friction. That may even come from experienced people irked that they didn’t get picked. Everyone has a bad day. Again, I think we can weather this. Finally, the different teams and individuals on the teams probably gave substantially different levels of comments with varying degrees of quality. Some of the comments are just going to be useless. Further, My opinion probably doesn’t jive with my teammates in every regard. Maybe a team didn’t put critical comments in at all (although they had to put in comments). Yes, these things are going to be uneven, maybe even contradictory. OK. Again, cope.

This blog post once started off as a rebuke of the selection process around those comments. It’s not now. I want to repeat, one more time, I think the committee did great work and selected an awesome set of presentations that will make for a wonderful Summit. Thank you for all your hard work. And thank you, Amy, for doing a great job organizing what is a daunting task. And thanks for releasing the comments.

Jul 02 2014

Statistics and Natively Compiled Procedures

Statistics are one of the single most important driving factors for the behavior of the query optimizer. The cardinality estimates stored within the statistics drive costing and costing drives the decision making of the optimizer. So, how does this work with the new SQL Server 2014 natively compiled procedures? Differently.

In-memory tables do not maintain their statistics automatically. Further, you can’t run DBCC SHOW_STATISTICS to get information about those statistics, so you can’t tell if they’re out of date or not or what the distribution of the data is within them. So, if I create some memory optimized tables, skip loading any data into them and then run this standard query:

SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = 'London';

The estimates in the execution plan for the query show some pretty wild values:

Estimates

That’s an estimated number of rows of 65,536 and an actual of zero in a table scan because I created my table without an index. If I recreate it with an index, but still no data, the estimates change and we have an index seek operation now:

EstimatesIndex

That’s suggesting that the optimizer thinks there are 256 rows. But there’s no data here. So, let’s load some data into the table. Here are the new estimates from the index seek operator:

estimateswdata

I haven’t yet updated the statistics, so the optimizer still thinks there are zero rows in the table, or at least, it has no statistics. Well, not true:

SELECT  s.name,
        s.auto_created,
        s.user_created,
        s.filter_definition,
        sc.column_id,
        c.name AS ColumnName
FROM    sys.stats AS s
        JOIN sys.stats_columns AS sc ON sc.stats_id = s.stats_id
AND sc.object_id = s.object_id
        JOIN sys.columns AS c ON c.column_id = sc.column_id
AND c.object_id = s.object_id
WHERE   s.object_id = OBJECT_ID('dbo.Address');

If I run this, I’ll see statistics on the table, including system generated statistics:

stats

One point, while statistics don’t update automatically, they clearly can still be created automatically. But I can’t run DBCC SHOW_STATISTICS to see what’s in there. So, let’s see what estimates look like in the natively compiled procedure. I’ll take the same query code above and convert it to a proc. Then, when I capture the estimated plan from the procedure (no actual plans allowed), the index seek operator shows these estimates:

estimatescompiled

Now, we have an accurate estimate on the number of rows this query is likely to return. Interesting. So, let’s drop the procedure, update the statistics and then rerun the query and the procedure. First the query. The estimates don’t change. I’m still seeing estimated values as 256 while the actual is 434. So, let’s free the procedure cache and try again:

EstimatesUpdated

Ah, there we go. The plan itself came out the same way, but we clearly have more accurate estimates now. On to the procedure. I’ll recreate it and then get the estimated plan. Here are the estimate values from the same index seek operation:

Estimatescompiledupdated

Oops. Still estimated 0 rows.

What’s all this mean? I’m not sure. The documentation from Microsoft in this area is sketchy. During the most recent 24 Hours of PASS, I was able to ask Microsoft about the impact of statistics on natively compiled plans. They suggested that it was not necessarily going to be the same as we see in standard queries. These tests make that fairly evident. Also, it looks like the default values of estimated rows for in-memory tables is different. If I create standard tables, empty, and run the same query against them, the estimated number of rows is what I expect, 1. But in the case of in-memory tables it’s 256 with an index and 65,536 without one (or at least that’s what I’m seeing). However, the estimates for the natively compiled procedure never changed in this test case, always at 0. This is hardly shocking, but it seems that different rules apply for in-memory tables and their statistics as well as natively compiled procedures and their consumption of those statistics. And, as Microsoft has changed the default estimated number of rows for table variables from 1 to 100 in SQL Server 2014, it seems we have another instance where they’re defaulting to an even higher value and one where the values seem to just disappear.

The behavior of statistics within in-memory tables is extremely interesting because you may see changing behavior with those tables as your queries get more complex and your data changes. It makes a very strong case for making sure that you update your statistics on a regular basis on these tables.

I’m taking this show on the road. If you want to get an all day class on query tuning, I’ve got a lot of opportunities coming up. I believe that Albany, on July 25th, is not yet sold out. You can register here. I’m teaching an all day session at SQL Connections in September in Las Vegas. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. I’m excited, and more than a little humbled, to get the opportunity to present an all day pre-conference seminar at the PASS Summit in Seattle in November. Go here to register.

Jun 30 2014

The Curse of Working With A DBA

noI no more than finished my rant from last week than I started thinking about all the reasons why a healthy chunk of the reasons that developers want to bypass relational database is not the horror of the relational database itself, although, that’s there. No, a very large reason why is the DBA.

We’re on a blog called The Scary DBA. I earned that title, well sometimes. Sometimes I got it and I wasn’t sure why. However, it’s perfectly in keeping with how many people view their database administrators; grumpy, obstructionist, slow, difficult, control freak, etc.. There are even jokes about it, “What’s the DBAs favorite word? No!” And for those answering “It depends” that’s two words.

I understand why. In large part it’s that phone in your pocket (used to be a pager on your belt, I’m old). That darned thing can go off any time night or day. It tends to make you very gun-shy. You start doing anything you can to keep that thing from going off. And developers, holy moly, they want to change things. They want to introduce new tables and new queries and they want to do it all really fast, faster than you can possibly review all that code, and all, ALL, AAAAALLLLLL, that code needs to be reviewed before we can let it unsettle our production servers. No.

And developers get crazy ideas in their heads sometimes. Maybe it would be easier to put the queries in the code rather than in stored procedures? What? How the heck can I review all the code too? No.

Developers also start thinking to themselves, you know, most of this T-SQL code could be generated using other code. Wait, that means even more T-SQL generated even faster, and generated by a program, and I can’t review that program, or it’s code and you want to put it into my production server? Are you smoking something over there? No.

CLR? Ha! No.

ORM tools? Have you seen that T-SQL? Hell no!

How about other tool sets? Maybe an object database would work here. We may be better off using unstructured storage for data collection in this situation. ID/Value pairs might work well for this application. No, no, no and no again. Just in case you think of something else, no.

Gee, I’m sure if I were a developer I’d be perfectly happy with this approach. I’ve no doubt as developers introduce even harder subjects like agile development, devops, and other things in the future the responses will be just as nuanced. In short, I’d be doing anything I could to bypass the DBAs too.

So, what do we do about it as DBAs? Change. Use the word “Yes.”

We need to recognize that the business is changing, fast. That means that the applications are going to change, faster. We, DBAs, must become enablers. We must create processes and methods that smooth and speed the deployment process in order to provide lots of opportunities for automated testing because you’re not going to review this code and you can’t just stand in the way. We need to adopt to and adapt the development and deployment paradigms used by the developers. We have to start treating databases, as much as possible, like code. We need to have our code in source control along side the application code. We need to be at the stand ups. In short, we need to change what we do and the way we do it. We can’t just say no. We need to say yes. The goal is to get in with the developers and influence through assistance, not just stand in the way.

Is it more work? Yes. Is it going to be hard? Yes. Will we have to go quite a long ways to convince them that we’re not just going to say “No” again? Yes. Are there damned good reasons for us to make these efforts so that someone who loves and protects the data and will be able to provide special skills not developed by most programmers? Yes again.

See, it’s easy. Try it.

Jun 26 2014

Passion

I know I tend to be overly passionate. It’s something that has gotten me into trouble in the past. It’s also probably a huge factor in the things I’ve been able to accomplish in life. I’m bringing it up at this time because I think passion is causing some conflict within the community around the Professional Association for SQL Server (PASS).

On the 25th of June just past the announcements went out for the sessions accepted at the PASS Summit 2014. I found this stressful and exciting two ways. First, and for me personally, most importantly, because I had submitted several sessions and I desperately wanted to speak at the PASS Summit (I’ve spoken there every year since 2008 and I’ve made the Top 10 sessions two years in a row, for which I’m truly grateful, back to our story). Second, because this year I wanted to help make a difference so I volunteered on the selection committee (and I was on a committee other than one I submitted for, I didn’t influence selection there at all). I wanted to get my sessions accepted, and I wanted to see the work I put in on display. Happily, both occurred. But, the day was marred.

Let’s sidetrack (again) for a moment. I consider myself to be just a guy, a DBA, a developer, an IT pro. It’s what I’ve been doing for 20+ years (yeah, I’m old) and I’ve been relatively successful at it. But, I’m also a Microsoft MVP, a published author, frequent blogger, and an international speaker. I attribute most of that stuff, not to any great ability I have, but to a lot of luck, a lot of hard work, and, here’s the kicker, to my involvement with PASS. Go back ten years, I went to my first Summit down in Dallas, TX. I attended sessions and went back to my hotel room, except one night. During that day I had spent a little time chatting with a company and they invited me to a party they were throwing that night. I went. And I met some people. They were just DBAs and developers, just like me, but, they were also involved in the organization that put on the event, PASS. I liked these people. So, I started volunteering which led to another Summit and another and writing and speaking and… well, let’s just say, getting involved was a good thing. Being passionate about it all paid off, literally and figuratively. I really do owe PASS and the people that make it up a lot.

So, there are a lot of passionate people in this little gang of ours. And some of those passionate people didn’t like the outcome of the selection process. Being passionate, they voiced their opinions. LOUDLY. At length. Some of what they said had merit. Some of what they said was just hurt feelings. Some of what they said was a complete misunderstanding of how things worked within the committees and the selection process. But a lot of passionate people, who care about PASS, argued for a little while about the Summit selections. And, being a passionate guy, I took part. A lot of the work I did for the committee wasn’t making the light of day (more on that later, maybe, depending on how some internal communications turn out) and I was quite passionate about that. I don’t know this for a fact, but I suspect pretty strongly that my passion, what’s more, my public passion, around this topic made some people angry. I’m positive that others passion for the topic, regardless of their causes and the rightness or wrongness of their cause, definitely made people angry. Here’s where I get in trouble.

Get over it.

If we didn’t care about PASS and what the organization has done for us, and how we’d like to help it, and help others, and grow it, and reward ourselves (because I do believe everyone is fundamentally greedy, might as well acknowledge it), and just plain replicate the experience for others that I’ve had (because it’s been an overwhelmingly positive experience, I can’t say enough good things about PASS), then there wouldn’t be any passion. And if there was no passion, there would be no brouhaha and hurt feelings and the developing cliques (oh yeah, people are drawing lines like this was a war in the Balkans, apropos on the 100th Anniversary of World War I). But you know what, if there wasn’t any passion for, in, and around this organization, then it wouldn’t be the organization that it is.

It’s a great organization and people are going to be passionate about it. Cope. Passion is going to lead people to saying negative as well as positive things. Deal. People just might say negative things about you. Develop an epidermis.

Look, we should be able to disagree without being disagreeable, but passion leads us down dark roads sometimes. Let’s try to be understanding of that fact and recognize that the passion that makes this organization great is also the one that’s going to lead to conflict sometimes. Let’s just try to remember that and maybe we’ll be able to work towards sharing the great things this organization does with others and fight with each other less. Maybe.

NOTE: I made an edit about the work I did on the selection committee. It was on a track that I didn’t submit for. There was no way my work there could influence my selection. Plus the fact that the abstract evals and speaker evals were done by two different teams of people. Just want to be clear about that.

Jun 25 2014

The Utility of Execution Plans in Natively Compiled Procedures

I’m actually having problems identifying the utility of execution plans when working with natively compiled procedures. Or, put another way, why bother? I’ve posted a couple of times on natively compiled procedures and SQL Server execution plans. I’ve found the differences interesting and enlightening, but I’m seriously questioning why I should bother, at least currently. I’m sure there will be many changes to the behaviors of the natively compiled procedures and their relationship with execution plans. But right now, well, let’s look at an example. I have three simple tables stored in-memory. Here’s the definition of one:

CREATE TABLE dbo.Address
    (
     AddressID INT IDENTITY(1, 1)
                   NOT NULL
                   PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 50000),
     AddressLine1 NVARCHAR(60) NOT NULL,
     AddressLine2 NVARCHAR(60) NULL,
     City NVARCHAR(30) COLLATE Latin1_General_100_BIN2 NOT NULL,
     StateProvinceID INT NOT NULL,
     PostalCode NVARCHAR(15) NOT NULL,
     ModifiedDate DATETIME
        NOT NULL
        CONSTRAINT DF_Address_ModifiedDate DEFAULT (GETDATE())
    )
    WITH (
         MEMORY_OPTIMIZED=
         ON);

I can then create the following code as a natively compiled procedure:

CREATE PROC [dbo].[AddressDetails] @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

When I call for an estimated plan (remember, no actual plans) I’ll get this:

Scan

If you click on it, you’ll note that there’s an index scan. But the costs are all zero. Everything is FREE! Or not. The execution time is 93ms. If I put an index on the City column, the execution plan changes to the one I showed previously, an index seek, and the execution time goes to 42ms. Clearly, the scans are costing something. Scans aren’t necessarily bad and seeks aren’t necessarily good, but it’s hard to spot issues with execution plans with no costing involved at all. Which makes me wonder, should we bothering with execution plans for the natively compiled procs? I’m honestly unsure.

For most query tuning, statistics matter a lot. I understand we still have room in Albany on July 25th. You can register here. I’m doing an all day session at SQL Connections in September in Las Vegas. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. I’d love to talk query tuning with you all day long.

 

 

Jun 18 2014

The Curse of Relational Databases

Let’s face it, none of Information Technology is easy. Oh yeah, there are those few geniuses that have an absolute grasp over some small aspect of the stack, or those other geniuses that have a very shallow knowledge level, but understand the entire stack. But the stack itself, it’s vast, deep, wide, utterly unfathomable. So what do you do? You cheat. You take shortcuts. You ignore things you don’t like/understand/appreciate. And then there’s all the things you just don’t know. Or, you cheat another way, you get experts that have drilled down on a particular technology so that they’ll provide you with the knowledge you need. Ah, but then you have to listen to them and what happens when your local genius (deep or wide) doesn’t agree with your hired gun? Do you override your local person for the hired gun (I’ve seen this happen a ton where consultants were favored over in-house), or do you go with your local person (I’ve also seen this where the local person who has solved all the problems before may be over their heads now, but they’ve always been right and are therefore trusted)?

I just read (and I mean I finished about 90 seconds ago) this really interesting article on The Curse of the Excluded Middle. I won’t even pretend to you that I understood all of it. But, I did get a pretty fundamental concept out of it, this programming stuff is very hard, we’re going to take shortcuts to get through it, and those shortcuts come with a cost. The argument being put forward isn’t to somehow find a magic solution. It’s simply to acknowledge that there really is a cost, maybe even a cost you don’t completely understand. Further, that cost, and especially your lack of understanding of it, will come up and bite you on the behind.

Which brings me around finally to developers and databases. Relational databases are a pain the bottom. They really are. Speaking just of SQL Server (where I spend most of my time) you have to work with a ridiculous, archaic, language, T-SQL, in order to manipulate the data. And the rules of normalization, yeah, we can all learn them, but applying them makes every single aspect of coding harder. Plus the language lets us do things that it then interprets in horrendous fashion. Oh, and don’t forget all the obscure and weird maintenance and configurations that you have to go through to keep the silly servers online and functioning correctly. Then there’s the whole object/relational impedance mismatch thing to chew on our behinds even further. In short, I completely understand why developers would like to burn the entire edifice to the ground (come see one of my presentations when I talk about the “data persistence layer” that a particular dev team wanted to build). And all that is just the technical side of this mess. I’m not even going to address the personnel issues that come with the different focuses of responsibility between a developer and a DBA.

So when the developers bring in an Object Relational Mapping (ORM) tool or they explicitly attempt to slap out at DBAs by going after a NosQL database (and no, despite the new twist, it means NO F’ING ESSQUEELL, instead of Not Only SQL as many are saying now), I understand why they would do this. It short circuits all the issues. We get around the problem. We speed development by eliminating that thing that we didn’t completely understand and certainly didn’t like and…. Hang on… Isn’t there a darn good chance we’re digging a hole here?

Yes.

Don’t get me wrong. I see the need for unstructured data stores, ID/Value pairs, speed over consistency, speed over durability, the need to move fast because your competition is sure as heck trying to move fast. So NoSQL databases serve an absolutely valuable purpose and used correctly fix unique and difficult problems. A well structured ORM properly applied absolutely saves development time. But there’s this nasty little surprise hidden behind the need, the sometimes seemingly desperate need, to completely get rid of relational storage. That surprise? Relational storage actually works and works well when applied to the appropriate problems in the appropriate ways. It provides a means of collecting information fairly quickly (although not as fast as many NoSQL databases), storing it efficiently (although, maybe not as efficient as some object databases), and returning it to the users on demand (and here relational does stick out again). And does it all in on place, not one for collection, another for reporting, or some of the other strange perambulations I’ve seen people going through with some NoSQL implementations (again, not all, some are awesome, but many are horrific).

About twice a year I get to read a “death of the DBA” article that points to a technology or process or tool that’s going to eliminate the need for those nasty, ugly, difficult, relational databases and those freaks who try to keep them online and available. And about twice a year I see lists of the most needed workers in IT and guess what’s almost always there, yep DBAs. The fact is, relational storage does work. And instead of trying to eliminate it, or the DBA, or the code necessary to interface with it, embrace the stuff and learn to use it, or hire someone who actually knows how to use it and then listen to them. I’ve just seen too many places where the need to eliminate relational storage and DBAs is driven by one of two things, I have a shiny new hammer and everything is a nail, or, databases and DBAs are a pain because they make us do stuff we don’t want to, so let’s bypass them. Those are almost precisely the wrong reasons to go about moving to a NoSQL implementation, because you’re going to be ignoring stuff, as the Curse of the Excluded Middle talks about (and I know, it didn’t talk about databases, I’m extrapolating, hang with me here), and the things you ignore, or worse yet, don’t know about, are going to hurt and may hurt badly.

Jun 17 2014

Natively Compiled Procedures and Bad Execution Plans

I’ve been exploring how natively compiled procedures are portrayed within execution plans. There have been two previous posts on the topic, the first discussing the differences in the first operator, the second discussing the differences everywhere else. Now, I’m really interested in generating bad execution plans. But, the interesting thing, I wasn’t able to, or, rather, I couldn’t see evidence of plans changing based on silly things I did to my queries and data. To start with, here’s a query:

CREATE PROC [dbo].[AddressDetails] @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

And this is a nearly identical query, but with some stupid stuff put in:

CREATE PROC [dbo].[BadAddressDetails] @City VARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

I’ve change the primary filter parameter value to a VARCHAR when the data is NVARCHAR. This difference is likely to lead to differences in an execution plan, although not necessarily. If I load my tables up and update my statistics, then create the procedures and run them both with the same parameter values, I should detect any differences, right? Here’s the resulting execution plan:

ActualPlan

It’s an identical plan for both queries. In fact, the only difference in the plan that I can find is a CAST in the Index Seek operator for the BadAddressDetails procedure, as expected. But, it didn’t prevent the plan… the plan, from showing any other difference. However, execution is something else entirely. And this is where things get a little strange. There are two ways to execute a procedure:

EXEC dbo.AddressDetails @City = 'London';
EXEC dbo.AddressDetails 'London';

Interestingly enough, the first one is considered to be the slow way of passing a parameter. The second one is the preferred mechanism for natively compiled procedures. Now, if I execute these two versions of calling the procedure, I actually see different performance. The first call, the slow one, will run, somewhere in the neighborhood of 342 µs. The other ran in about 255 µs. Granted, we’re only talking about ~100 µs, but we’re also talking a 25% speed increase, and that’s HUGE! But that’s not the weird bit. The weird bit was that when I ran the good and bad queries together, the slow call on the bad query was consistently faster than the slow call on the good query. The fast call reversed that trend. And, speaking of which, the bad query, with the CAST ran in about 356 µs or ~25% slower.

The execution plan really didn’t show any indication that this would be slower, which made me do the next thing I did. I updated my Address table so that all the values were equal to ‘London.’ Then, because statistics are not maintained on in-memory tables automatically, I updated the statistics:

UPDATE STATISTICS dbo.Address WITH FULLSCAN, NORECOMPUTE;

With the statistics up to date, I dropped and recreated the procedure (there is no recompile with natively compiled procedures, something to keep in mind… maybe, more in a second). So now, the selectivity on the index was 1. The most likely outcome, an index scan. Guess what happened? Nothing. The execution plan was the same. I then went nuts, I converted all my tables so that a horrific mishmash of data would be brought back instead of clean data sets and I put data conversions in and… nothing. Index Seeks and Nested Loops joins. Weirdness.

I’m actually unsure why this is happening. I’m going to do more experimenting with it to try to figure out what’s up. But, that lack of recompile, maybe it doesn’t matter if, regardless of data distribution, you’re going to get the same plan anyway. I’m really not positive that looking at the execution plan for natively compiled procedures does much of anything right now. However, these tests were a little bit subtle. I’ll load up more data, get a more complex query and then really mess around with the code to see what happens. I’ll post more of my experiments soon.

I promise not to experiment on you though when I’m teaching my all day query tuning seminars. There are a bunch coming up, so if you’re interested in learning more, here’s where to go.  Just a couple of days left before Louisville and I’m not sure if there’s room or not, but it’s happening on the 20th of June. Go here to register. Albany will be on July 25th, but we’re almost full there as well. You can register here. SQL Connections is a pretty cool event that takes place in September in Las Vegas. In addition to regular sessions I’ll be presenting an all-day session on query tuning on the Friday of the event. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. Let’s get together and talk.

 

Jun 10 2014

Differences In Native Compiled Procedures Execution Plans

All the wonderful functionality that in-memory tables and natively compiled procedures provide in SQL Server 2014 is pretty cool. But, changes to core of the engine results in changes in things that we may have developed a level of comfort with. In my post last week I pointed out that you can’t see an actual execution plan for natively compiled procedures. There are more changes than just the type of execution plan available. There are also changes to the information available within the plans themselves. For example, I have a couple of stored procedures, one running in AdventureWorks2012 and one in an in-memory enabled database with a few copies of AdventureWorks tables:

--natively compiled
CREATE PROC dbo.AddressDetails @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

--standard
CREATE PROC dbo.AddressDetails @City NVARCHAR(30)
AS
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    Person.Address  AS a
                JOIN Person.StateProvince  AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN Person.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
GO

The execution plans are obviously a little bit different, one going against in-memory tables and indexes and the other going against standard ones. However, that’s not the point here. This is the point. One of the first things I always check when looking at a new execution plan is the first operator, the SELECT/INSERT/UPDATE/DELETE operator. Here it is from the estimated plan of the query against the standard tables:

StandardSelectProperties

All the juicy goodness of the details is on display including the Optimization Level and Reason for Early Termination, row estimates, etc. It’s a great overview of how the plan was put together by the optimizer, some of the choices made, useful information such as the parameters used, etc. It’s great. Here’s the same thing for the natively compiled procedure:

NativeSelectProperties

Uhm… where are all my wonderful details? I mean, honestly, everything is gone. All of it. Further, what’s left, I’m pretty sure, is nothing but a lie. Zero cost? No, but obviously not from the standard optimizer estimates either, so, effectively zero. I’m pretty sure Physical Operation is just there as an oversight. In short, this is a different game. Yes, you will still need to evaluate execution plans for natively compiled procedures, but we’re talking a whole different approach now. I mean, great googly moogly, there’s not parameter compile time values. Is that just ignored now? Are the days of bad parameter sniffing behind us, or, are the days of good parameter sniffing gone forever? And it’s not just the SELECT operator. Here are the properties for a Nested Loops operator. First the standard set:

StandardNestedLoops

And, the natively compiled procedure:

NativeNestedLoops

Now, except for the fact that everything is FREE, the differences here are easier to explain. Execution Mode is applicable to columnstore indexes, and none of those are available yet in in-memory storage, so I’m not shocked to see that property removed. Same for the others. But this complete lack of costing is going to make using execution plans, always a problematic proposition with only estimated values available for so many things, even harder. It might even make it so that all you really need to do is look at the graphical plan. Drilling down on the properties, until meaningful data starts to appear there, might be a waste of time for natively compiled procedures.

I’ll keep working on these. Next up, can you get a “bad” execution plan with a natively compiled procedure? We’ll find out.

Just a reminder that I’m taking this show on the road. I’m doing a number of all day seminars on query tuning at various events in multiple countries. Louisville has almost filled the room we have available on the 20th of June. Go here to register.  But don’t wait. I’m also going to be in Albany on July 25th, but we’re almost full there as well. You can register here. If you were thinking about attending SQL Connections in September in Las Vegas, In addition to regular sessions I’ll be doing a day on query tuning. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event.

 

Jun 06 2014

Speaker of the Month, June 2014

It’s not like I can’t find plenty of great presentations here in the US, but, while I was over in Belgium at Techorama I checked out several of the presenters there. They were awesome. This was the first ever Techorama. It’s a developer focused event, but there was stuff there for data-centric people too. They had a great international collection of speakers from all over. The venue was a movie theater which was a lot of fun to present in, although may be a little too comfy to watch presentations (I fell asleep in one, I sure hope I didn’t snore). It was such a great event that I decided to pick my speaker of the month from there. I saw a bunch of very good presentations (even the one I fell asleep to was good, the parts I saw), but one stood out for me, both because of the topic and the presentation of the topic. I’m giving my speaker of the month award to Tiago Pascoal (b|t) of Portugal for his presentation at Techorama, “My Code is Ready, Now What.”

Tiago is a Microsoft MVP for Application Lifecycle Management (ALM) from Portugal, or, as he himself put it, “on the ass of Europe.” Pardon the language, but that was funny. I loved watching Tiago present. He was really funny, which was excellent because discussing ALM can be pretty dry. He said several times as he was presenting stuff “I should get a monkey to do this for me.” It was great. I loved the way he discussed things, stating matter of fact things like, regarding code in source control “It’s 2014, everyone is doing this now.” and his ease and manner of just assuming that, of course, the database is treated the same way. I like the way he talked about provisioning, comparing it to pets vs. cattle. Do you want to have to pamper and groom to get a server online, or is it just one more cow in the herd? Great stuff. I also loved how free and easy he was with typing. He demoed in a raw, live manner and got it all to work too. His slides had great pictures that both made his point and were entertaining. I really loved it. His demonstration of Octopus was so smooth I’m actually pretty jealous.

I don’t have much to offer Tiago for improvements. I loved his slides, but the look and feel within them wasn’t completely consistent. Minor nit, but I have to say something. I loved how he typed through the demos instead of having them canned (which I do), but it did sometimes slow down the flow, just a little. Again, minor nit. The presentation was just that good.

I’ve no idea where he’s presenting next. He is on Lanyrd (yay), but doesn’t having anything upcoming. I can heartily recommend going to see him speak.