Category: SQL Server 2012

Jan 07 2013

Statistics Update Clarification

By default statistics are created automatically within SQL Server. And, by default, these stats are updated automatically based on a set of triggers. The triggers are defined as

  • If 0 rows – Any data added leads to a statistics update
  • If < 500 rows – 500 rows added causes a stats update
  • If > 500 rows – 500 rows + 20% of the number of rows causes a stats update (unless you enable a traceflag in 2012 in which case you get a proportional value instead of 20%).

There are some exceptions for temporary tables and some variations for filtered statistics and filtered indexes, but you get the idea. I was writing an article on statistics in preparation for another Oracle/SQL Server discussion (on, you guessed it, statistics) and I came to these rules and wrote something along the lines of “So even if you have 499 rows, in order to see the stats update you would need to have to add more than 500 additional rows, meaning you would get to 999 before you saw a statistics update.” Which made sense, but I was suddenly concerned, is that valid. I decided to clarify it for my own piece of mind.

Quick set up of a test table and loading 499 rows into followed. I also ran a simple query to generate some statistics on the column:

CREATE TABLE dbo.StatsTest (StatsColumn INT);

WITH Nums
  AS (SELECT TOP (499)
ROW_NUMBER() OVER (ORDER BY (SELECT 1
)) AS n
  FROM master.sys.all_columns AS ac
CROSS JOIN master.sys.all_columns AS ac2
)
INSERT INTO dbo.StatsTest
(StatsColumn)
SELECT n
FROM Nums;

--auto create stats
SELECT *
FROM dbo.StatsTest AS st
WHERE StatsColumn = 42;

This results in the follow set of statistics:

Stats1

 

If you click on it to look at the details you can see that it shows 499 rows sampled.

If I add two more rows to the statistics and then run a different query, guaranteed to require a statistics update:

SELECT	*
FROM	dbo.StatsTest AS st
WHERE	StatsColumn < 42;

If I rerun DBCC SHOW_STATISTICS I still see the same output. In short, it appears that my original assumption was correct.

I ran several more tests, and I could get to 999 without updating the statistics, but adding just one more row and breaking 500 additional rows and the statistics update fired. I needed to validate this behavior for myself and it worked.

Nov 30 2012

Sharing the Love

Just a few blog posts that you ought to go and read.

First up, Tom LaRock maintains a listing of SQL bloggers split up into various cleverly named groups to show you where to go to get good information. This really is an excellent collection of bloggers. It’s the people I go to when I need information. Some of them are better resources than the Books Online when they post something. Personally, I’ve made the list for the last several years, but Tom has decided that I’m worth of elevation, so I’ve gone from the Model database to the Master database. Thanks Tom. One blog that’s not on Tom’s list is Tom’s blog. You should be reading that regularly too. And congratulations to Tom again on making MCM.

Next, one of the bloggers on Tom’s list, and a friend, is Aaron Bertrand. Aaron has posted pretty much everything you need to know about how to get the most out of DBCC, not at his blog. I was considering a blog post on this myself but after reading this, why bother. He has it covered, up to and including linking over to Paul Randal’s advice on how to break up your DBCC checks (and Paul would know since he wrote the silly thing). Seriously, go and read Aaron’s blog post right now. It’s must reading for all the DBAs who are not Paul.

Nov 28 2012

HDInsight: First Pass

I installed HDInsight with no errors. The install is so brain-dead, I couldn’t see how there could be errors. But, I decided to fire up the service to start playing with it. That’s when I hit the errors. First up, I was getting logon errors for the Hadoop service itself. I tried resolving them manually, but was digging a hole. So, I uninstalled, and reinstalled, figuring I had missed an error message somewhere along the line. After this install, I didn’t get logon errors. Instead I got internal errors. Some more investigation and it appears that the service was marked for deletion. The uninstall worked, but somehow wasn’t complete. Yay! So, I did another uninstall after rebooting the server. Install #4 coming up.  And we’re back to logon errors. Specifically:

Starting IsotopeJS services
Starting isotopejs
System error 1069 has occurred.

The service did not start due to a logon failure.

Let’s check the logs, if any. Nothing in the c:\Hadoop directory. Checking the system logs to see if there’s anything there. Nothing I can see. No joy. I’ll go through it all again tonight.

Nov 21 2012

Final 2012 Learning Opportunity

There’s still a little time left in the year. I know some businesses have allocated training budgets and if they don’t spend the money, they lose it from their budget. So a few of you might still be looking for opportunities to get your learn on. I’ve got one more big one before the end of the year. You can try heading down to Florida for the Live 360 conference. I’ll be there along with several friends (love my #sqlfamily). I’m presenting two sessions, one on setting up an environment for performance testing using the new distributed playback capabilities from SQL Server 2012 and one on introducing the accidental DBA to backups. You can still register here to attend these sessions and all the other great ones being presented. Last chance to use up that training budget for the year, so don’t miss it.

Oct 22 2012

Clustered Indexes Have Statistics Too

It may seem obvious, but I’ve heard more than one person suggest to me that statistics on a clustered index just don’t matter. That if the clustered index can satisfy a given query, it’s going to get selected. That just didn’t make any sense to me, but I haven’t seen anyone set up a test that shows how it might work one way or the other. Here you go.

First, I’m going to create a table and load it up with data. I’m intentionally using strings because I don’t want to confuse the ease of management of integers within indexes. I also went for one column that would have a very attractive set of statistics and one that would have a very ugly set. Also, because we’re only dealing with two columns at any given juncture, either a clustered or a non-clustered index would be a covering index. Finally, I didn’t mark the clustered index as unique because I wanted the non-selective clustered index and the highly selective clustered index to both have to deal with that extra bit of processing. Here’s how I set up the table and the data:

CREATE TABLE dbo.IndexTest (
SelectiveString VARCHAR(50),
NonSelectiveString VARCHAR(2))

WITH Nums
AS (SELECT TOP (100000)
ROW_NUMBER() OVER (ORDER BY (SELECT 1
)) AS n
FROM master.sys.all_columns AS ac
CROSS JOIN master.sys.all_columns AS ac2
)
INSERT INTO dbo.IndexTest
(SelectiveString,
NonSelectiveString
)
SELECT n,
CASE WHEN n % 3 = 0 THEN 'ab'
WHEN n % 5 = 0 THEN 'ac'
WHEN n % 7 = 0 THEN 'bd'
ELSE 'aa'
END
FROM Nums;

From there I created the first two indexes:

CREATE CLUSTERED INDEX ClusteredSelective ON dbo.IndexTest
(SelectiveString);

CREATE NONCLUSTERED INDEX NonClusteredNonSelective ON dbo.IndexTest
(NonSelectiveString);

Then I ran each of these queries, both of which are actually going after fairly selective bits of data, although largely relatively speaking in terms of the second query:

SELECT * FROM dbo.IndexTest AS it
WHERE SelectiveString = '2323';

SELECT * FROM dbo.IndexTest AS it
WHERE NonSelectiveString = 'aa';

This resulted in the following two execution plans:

As you can see, the clustered index was used in the first query. It makes sense because we’re querying against the clustered key and it’s a very highly selective key. The second query, despite being against a fairly non-selective key, 48,000 rows out of 100,000, used the non-clustered index. If I drop the non-clustered index and use just the cluster for the second query, the number of reads goes from 110 to 299 despite the fact that the same data is being returned. Clearly there’s a huge advantage to how data is ordered. Also, clearly, the fact that the statistics suggest that the cluster can’t immediately satisfy the query makes the optimizer choose other options. But, what happens if we change the indexes like this:

CREATE NONCLUSTERED INDEX NonClusteredSelective ON dbo.IndexTest
(SelectiveString);

CREATE CLUSTERED INDEX ClusteredNonSelective ON dbo.IndexTest
(NonSelectiveString);

Then, when I rerun my queries, I get these execution plans:

At least to my mind, it’s pretty clear. The statistics for the cluster clearly help the optimizer decide if that index is useful. Yeah, if I drop the nonclustered indexes and then run the queries the clustered index is always used, but that’s not because the cluster is selective or not, it’s because the cluster is the table.

I’m not sure where this concept that the statistics of a clustered do not matter, but, from these tests, it seems that they do.

And remember, in just a couple of weeks I’ll be doing 7 hours of query performance tuning instruction at the PASS Summit. You can sign up, as far as I know, right up to the day of the event. The name of the session is Query Performance Tuning: Start to Finish. I cover gathering metrics, understanding the optimizer, reading execution plans, and tuning queries. It’s a beginner’s level to intermediate course. It should be a lot of fun. Go here to register. I hope to see you there.

Sep 24 2012

Interviewing a DBA

I’m not a fan of trivia style interview questions. Yes, I ask a few because you have to in order to immediately eliminate the completely unqualified applicants. Even those types of questions, in my opinion, need to be focused on concepts and not syntax. The reason we have the Books Online with SQL Server is because you shouldn’t have to memorize every possible command along with all their parameters. Want to know how to write a MERGE query? Look it up. What does a MERGE query do? That you ought to know. I think concepts are important. Questions about the recovery models within SQL Server aren’t trivia about the system, they’re trying to get to your understanding of how point in time recovery works.

I don’t really like posting interview questions. And most of the time when I’ve seen interview questions posted (even mine), they’re pretty trivial stuff that doesn’t really get to whether or not the person you’re trying to hire is a good fit for the position and your team. I also don’t like posting interview questions because some people will try to use them to study up and attempt to BS their way into a position they frankly don’t deserve and haven’t earned. SQL Server knowledge and experience comes from using it to solve problems out in the world and protecting the information generated by a business.

That’s why I love this question. And I don’t mind sharing it with you because you can’t really memorize an answer to it:

You get a call from one of the business people. They tell you that the database is running slow. What do you do?

This is completely and utterly open-ended. It can go anywhere. In fact, it’s going to go where you lead it. For example, you could say “I first look at the Windows server error logs.” OK, that’s fine (several people I’ve interviewed started there). What indications would you find there that the server is running slow or what would you find there to show why the server is running slow? Suddenly, maybe you don’t want to look at the error logs for the server any more, or maybe you do. But you get the idea. There is no single correct answer here. There are however, lots of very problematic paths, and I’m going to let you go down them. I had one guy insisting that the very first thing he needed to do after the phone call was take a look at the application code to see the method used to make the call to the database. We spent quite a bit of time exploring why this seemed to be the best approach to him. Was it? I’m not saying. No hints on this one. Your answer for this question, is your answer, and that’s why I love it.

Further, as we explore this question, and I’ve spent anywhere from 10 minutes up to an hour working on it as part of an interview, I’m also getting to see how you deal with problematic situations, what your logic chain looks like, what your understanding of SQL Server is, and, most importantly, how you fit into the team. Because with an open-ended question like this, we get to talk. We’re way beyond silly trivia contests now.

Before you think this is unfair to people who aren’t performance experts, fine, let’s talk about what happens when you get an alert that the server is offline. Not a systems person? OK, we just got an alert that a database consistency check failed, now what? See, the point is to go on an adventure where we explore your knowledge and approach. I just have to work hard to make sure we stay somewhat on topic so that I can assess your knowledge and skill level.

Now, if I approach any of these questions and your response is to reject them out of hand, something I’ve run into, then we’re done. I’m not going to focus on trivia, which is how lots of people prep for interviews. I expect you to have concepts, process, logic, and methods available from your time studying and learning. So if we interview, be ready for this exploration, not a trivia contest. And the only way to really prepare is to get experience and knowledge by actually working with SQL Server.

Oh, and sometimes, I ask questions or make statements that are wrong. Sometimes it’s on purpose. Other times, it’s because I screwed up or was ignorant. But you can’t sit there agreeing with me. You better be paying attention because I might be testing you further.

This type of question is just too perfect for understanding how much you know about SQL Server.

Want to start to prepare for answering this kind of question? I’ve got an opportunity for you. At the PASS Summit 2012 this year, I’ll be running an all-day pre-conference seminar called Query Performance Tuning: Start to Finish. In it, I’ll cover quite a bit of what might make it possible for you to answer this question should you be presented it in an interview. No, I’m not guaranteeing you’ll answer it correctly. I’m just offering a chance to prepare. Sign up for the Summit today. There’s still a discount in place that can help you offset the cost of the seminar until the 30th of September.

Sep 07 2012

SQL Server vs. Oracle

Just so we’re clear, I use SQL Server. I like SQL Server. But, this doesn’t mean I have anything against Oracle. It’s fine. It’s good. But, I know very little about it. However, throughout my career I’ve found myself needing to understand it better. Either because I’m trying to train Oracle people to better use SQL Server and I need to be able to speak a little of their language to facilitate translation. Or, because I’m defending SQL Server on some technical point that the Oracle people don’t completely understand. Or, because I’ve said something stupid about Oracle in my ignorance.

Now, you know how busy you are, and I know how busy I am, so I doubt either of us has the time we really need to learn Oracle much. So, what do you do? Well, Red Gate Software, who straddles the worlds between Oracle & SQL Server like the Bifrost between Midgard & Asgard, has started a series of conversations between two people who know something about each platform, Jonathan Lewis (blog) and me.

We had our first conversation talking about clustered indexes. We covered how they work in both platforms (not that differently) and they’re used and abused. Interestingly enough, according to Jonathan, clustered indexes just aren’t used that much within Oracle, despite the fact that they really do behave mostly the same way as they do within SQL Server, where we use them on most every table (or at least so I maintain you should). It was a great discussion (NOTE: not a fight, no one was nasty or mean, we talked).

We’re going to have another discussion. We’re going to be talking about temporary tables. Again, I don’t know much about Oracle, so please, this is not an attack, but apparently they don’t have the same concept of temporary tables as we do in SQL Server. We’re going to cover a lot of the myths and misperceptions surrounding temp tables on both Oracle and SQL Server, how they work and how they affect performance. I learned a lot during the last conversation and I don’t doubt I’ll learn a lot during this one. If you’re interested, please go to this web page and register.

And, I’d be remiss if I didn’t mention again, if you like learning about performance in SQL Server that you should consider attending the PASS Summit 2012. If you register now, you save $500, which is just enough to pay for my pre-conference seminar, Query Performance Tuning: Start to Finish. I’ll be covering all aspects of performance tuning from gathering metrics to understand which queries are running slow, to reading execution plans to understand why, to addressing the issues to fix the performance and make your queries hum. Please consider taking part. It’ll be a lot of fun and I’ll try like crazy to make it useful.

 

Aug 20 2012

Another Execution Plan… In the Cloud!

A couple of weeks ago I posted about how different execution plans look when using the Azure Management Portal. I promised to look at a more complex plan and here we are. The query looks like this:

SELECT	*
FROM	dbo.Agent AS a
		JOIN dbo.Resource AS r
		ON a.AgentId = r.AgentId
		JOIN dbo.MovieStageResource AS msr
		ON r.ResourceId = msr.ResourceId
		JOIN dbo.MovieStage AS ms
		ON msr.Movieid = ms.MovieID
		   AND msr.MovieStageDefinitionId = ms.MovieStageDefinitionId
		JOIN dbo.Movie AS m
		ON ms.MovieID = m.MovieId
WHERE	a.AgentId = 42;

I didn’t go nuts in creating a crazy complex query. I just wanted to see a standard set of different operators to see how things changed. As before, I’m working off a controlled database that has identical structures and data in both the Azure database and my local copy of SQL Server. This way, the likelihood of identical plans is increased. Plus, if there are differences, it’s coming down to the differences in compilers between the two platforms, which makes things more interesting.

Running the query on my local server, the execution plan looks like this:

 

If you click on it to blow it up, you’ll see a pretty standard plan. If we read it in logical fashion (left-to-right, making Rob proud) you’ll see that a nested loop joins data between the Agent table and another nested loop. That loop joins between yet another nested loop and the MovieStage table. That loop is between one more nested loop and the Movie table. The final loop is between the MovieStageResource table and the Resource table. Nothing shocking or weird.

Same query, different (?) execution plan from Azure

First thing that jumps out at you are the red operators. We’ll get to those in a minute. So the plan looks totally different, right? No, not really. It’s pretty much the same. The estimated operation costs are different, which is interesting, but the basic plan is the same.

First thing to check in every execution plan? The properties in the first operator. Here they are from SQL Server:

Zoom in to read this if you need to. It’s a Full optmization, but the reason for early termination is a TimeOut. This means we can’t really trust this plan. On another day you may even see the plan change with the same query. Certainly if a service pack came out or something else the plan could change. However, this query looks like it’s properly optimized, even though the optimizer says it isn’t. Let’s see the properties from Azure. I’ve zoomed in to the “More” properties to get at some details:

The first point to note is the Estimated Rows value. Here, in Azure, it is 40.369. Meanwhile, back in SQL Server it is 40.8236. Now these differences are pretty subtle. I sure wouldn’t be sweating out of data statistics because of a .4546 discrepancy, but it gives us a clear indication why the estimated values for identical operations against identical data and structures are different. I’ll also note that the Optimization Level on the Azure query was also Full and the Reason for Early Termination was a timeout. These are fun and interesting results.

Now, let’s look at the most costly operator, the Clustered Index Seek operation. First the properties in SQL Server:

This is where things get a little weird. We have an estimated number of executions at 1000 and an estimated number of rows at 1. The actual number of rows was 22. The estimated cost for the operator is 2.24226. All well and good. But when we look at the same values in the Azure operator:

If you look at the Estimated Executions, Actual Rows, Average Row Size, Estimated I/O Cost and Estimated CPU Cost, they’re identical (with some rounding differences). But… the estimated cost of the operation is different; 2.24226 in SQL Server and 1.50594 in Azure. How do I account for the difference? I don’t. I think we’re looking at differences in how the optimizer establishes costs, because everything else looks the same.

Clearly, while the optimizers behave in a very similar manner, after all, we got effectively identical execution plans, they are not the same. The small difference in the cost assigned by the optimizer informs us of this. I’ll have to play around some more with actually complex queries to see what else can be seen.

If you enjoy this kind of drill down on queries, query tuning, execution plans and all that fun stuff, I’d like to suggest you sign up for my full-day pre-conference seminar at the PASS Summit 2012 this fall.

Jul 16 2012

SQL In The City: London 2012, Recap

Presenting on Ring BuffersWow!

How’s that for a recap?

The concept for the SQL in the City events is pretty simple. Put on a free event that instructs people on SQL Server, Azure, and related technologies along with a healthy smattering of Red Gate tools. All teaching is done by some of the best people in the business (and me).

This was the second event in London. The concept was launched there last year and succeeded quite well. This year the event filled it’s registrations so quickly that Red Gate felt obligated to have a second day, which almost completely filled up too. There were more than 350 people in attendance on Friday, and then, on Saturday, a day off, another 250+ people showed up. That’s well over 600 attendees over the two days. And what people! The UK audience is just excellent. These people really pay attention to what you’re saying. They don’t make many comments while the presentation is going on, but oh my gosh the detailed questions you get afterwards. It’s just wonderful. Plus, these guys are part of my #sqlfamily. I got to meet several people that I’ve met before in both the US and the UK. I love spending time with Tobiasz, Dave, Kev, Annette, Jonathan, Thomas, Neil. I also got to meet people that I had interacted with only online and they’re wonderful in person. Thanks to everyone who sought me out, especially Colin and Stephanie. It was a real privilege to meet you two (although neither of you knows the other). It’s the interpersonal aspects of these events that makes them great.

It’s so nice to be able to relax and show people something like how to get a sandbox environment set up, but all the ways that using Red Gate tools to do it can help you make the job easier, faster and cheaper. Normally you can’t give an open answer when someone asks how to do something better or easier during a session. This venue makes that possible.

I put on three sessions during the day. One on how to improve performance, yours and your code, in T-SQL. I got to regale the crowd on all the evils of ddltbl (not a typo, you had to be there) as well as common, simple, mistakes made all the time in T-SQL code. I also did a session on sandbox deployments. While I’m personally against giving everyone & their brother a copy of the production database for development, I acknowledge that it is a good set of data to develop against. So, if you have to do it, you may as well use Virtual Restore to save some space. Finally, my last session was on some of the lesser used performance metrics that are actually more useful than people give them credit for.

I sat in on some of the other presentations and they were great. I really liked Steve Jones (blog|twitter) session on handling disasters. I also liked watching a new speaker, Annette Allen (twitter), stretch her legs for the first time. She was good. UK user groups take note. You have another resource available.

I had a blast presenting all these sessions, twice, and the crowds seemed receptive. I really appreciate everyone who attended and the excellent feedback that they politely (but firmly) provided. Then we had beer.

Yeah, you heard me. Wonderful, glorious, Red Gate beer served right there at the event. It was a great batch of Select * Ale. Highly recommended at the end of a hard day of T-SQL learning and networking.

It was a magnificently run event. The only complaint I heard was that we had an inadequate number of bathrooms for the men (which, I learned, are not called stalls in the UK, some humor doesn’t translate well). Thanks to Annabel Bradford and all the team at Red Gate who put the event together (even if I do work for them, it was a really well run event). You guys are magnificent.

If you missed a session while you were there or you want to see a session again, keep an eye on the SQL In The City web site. Videos of the sessions will be uploaded. If you weren’t there, you missed it. But, I have good news.

We’re taking the show on the road. We’re going to be hitting five cities in the US in September and October and then Seattle in (which I think is still in the US) in November. It’s going to be a lot of the same people presenting the same topics, but it’s also going to include a ton of excellent local speakers at each of the cities we hit. This means the excitement and education that SQL In The City represents will be accessible to lots more of you soon.

Jun 27 2012

The New Phone Book Is Here!

I can’t help it. I get really terribly excited when I publish a book. Maybe it should be old hat. Maybe I should be jaded. But I’m just a 12 year old (it’s been argued 10) in reality so I get really, really jumping up & down excited when I get that wad of paper and my name is on the cover. It just doesn’t seem to get old.

What am I talking about? Oh, sorry. Let me explain.

After about nine months of work, my new, revised edition, of the Query Performance Tuning book is available. I want to publicly, and loudly, thank Joe Sack(blog|twitter) for the incredible job he did as tech editor. His hard work, and ruthless criticism, made this book what it is. Despite the scar tissue, thank you Joe. I also want to thank the series editor, Jonathan Gennick (blog), for once again placing any faith in me at all. Thanks for the opportunity Jonathan.

Please let me know what you think of it, assuming you pick up a copy. I think it should prove very helpful. At least I sure hope it is. And I’m told the printing errors have been fixed.

And, as a reminder, I’m going to be hosting an all day pre-conference seminar this fall at the PASS Summit 2012 that will be entirely based on the book. The name of the session is Query Performance Tuning: Start to Finish. I’d suggest registering for the Summit now. The savings can help pay for the seminar.