Category: SQL Server 2008

Nov 30 2012

Sharing the Love

Just a few blog posts that you ought to go and read.

First up, Tom LaRock maintains a listing of SQL bloggers split up into various cleverly named groups to show you where to go to get good information. This really is an excellent collection of bloggers. It’s the people I go to when I need information. Some of them are better resources than the Books Online when they post something. Personally, I’ve made the list for the last several years, but Tom has decided that I’m worth of elevation, so I’ve gone from the Model database to the Master database. Thanks Tom. One blog that’s not on Tom’s list is Tom’s blog. You should be reading that regularly too. And congratulations to Tom again on making MCM.

Next, one of the bloggers on Tom’s list, and a friend, is Aaron Bertrand. Aaron has posted pretty much everything you need to know about how to get the most out of DBCC, not at his blog. I was considering a blog post on this myself but after reading this, why bother. He has it covered, up to and including linking over to Paul Randal’s advice on how to break up your DBCC checks (and Paul would know since he wrote the silly thing). Seriously, go and read Aaron’s blog post right now. It’s must reading for all the DBAs who are not Paul.

Oct 22 2012

Clustered Indexes Have Statistics Too

It may seem obvious, but I’ve heard more than one person suggest to me that statistics on a clustered index just don’t matter. That if the clustered index can satisfy a given query, it’s going to get selected. That just didn’t make any sense to me, but I haven’t seen anyone set up a test that shows how it might work one way or the other. Here you go.

First, I’m going to create a table and load it up with data. I’m intentionally using strings because I don’t want to confuse the ease of management of integers within indexes. I also went for one column that would have a very attractive set of statistics and one that would have a very ugly set. Also, because we’re only dealing with two columns at any given juncture, either a clustered or a non-clustered index would be a covering index. Finally, I didn’t mark the clustered index as unique because I wanted the non-selective clustered index and the highly selective clustered index to both have to deal with that extra bit of processing. Here’s how I set up the table and the data:

CREATE TABLE dbo.IndexTest (
SelectiveString VARCHAR(50),
NonSelectiveString VARCHAR(2))

WITH Nums
AS (SELECT TOP (100000)
ROW_NUMBER() OVER (ORDER BY (SELECT 1
)) AS n
FROM master.sys.all_columns AS ac
CROSS JOIN master.sys.all_columns AS ac2
)
INSERT INTO dbo.IndexTest
(SelectiveString,
NonSelectiveString
)
SELECT n,
CASE WHEN n % 3 = 0 THEN 'ab'
WHEN n % 5 = 0 THEN 'ac'
WHEN n % 7 = 0 THEN 'bd'
ELSE 'aa'
END
FROM Nums;

From there I created the first two indexes:

CREATE CLUSTERED INDEX ClusteredSelective ON dbo.IndexTest
(SelectiveString);

CREATE NONCLUSTERED INDEX NonClusteredNonSelective ON dbo.IndexTest
(NonSelectiveString);

Then I ran each of these queries, both of which are actually going after fairly selective bits of data, although largely relatively speaking in terms of the second query:

SELECT * FROM dbo.IndexTest AS it
WHERE SelectiveString = '2323';

SELECT * FROM dbo.IndexTest AS it
WHERE NonSelectiveString = 'aa';

This resulted in the following two execution plans:

As you can see, the clustered index was used in the first query. It makes sense because we’re querying against the clustered key and it’s a very highly selective key. The second query, despite being against a fairly non-selective key, 48,000 rows out of 100,000, used the non-clustered index. If I drop the non-clustered index and use just the cluster for the second query, the number of reads goes from 110 to 299 despite the fact that the same data is being returned. Clearly there’s a huge advantage to how data is ordered. Also, clearly, the fact that the statistics suggest that the cluster can’t immediately satisfy the query makes the optimizer choose other options. But, what happens if we change the indexes like this:

CREATE NONCLUSTERED INDEX NonClusteredSelective ON dbo.IndexTest
(SelectiveString);

CREATE CLUSTERED INDEX ClusteredNonSelective ON dbo.IndexTest
(NonSelectiveString);

Then, when I rerun my queries, I get these execution plans:

At least to my mind, it’s pretty clear. The statistics for the cluster clearly help the optimizer decide if that index is useful. Yeah, if I drop the nonclustered indexes and then run the queries the clustered index is always used, but that’s not because the cluster is selective or not, it’s because the cluster is the table.

I’m not sure where this concept that the statistics of a clustered do not matter, but, from these tests, it seems that they do.

And remember, in just a couple of weeks I’ll be doing 7 hours of query performance tuning instruction at the PASS Summit. You can sign up, as far as I know, right up to the day of the event. The name of the session is Query Performance Tuning: Start to Finish. I cover gathering metrics, understanding the optimizer, reading execution plans, and tuning queries. It’s a beginner’s level to intermediate course. It should be a lot of fun. Go here to register. I hope to see you there.

Sep 24 2012

Interviewing a DBA

I’m not a fan of trivia style interview questions. Yes, I ask a few because you have to in order to immediately eliminate the completely unqualified applicants. Even those types of questions, in my opinion, need to be focused on concepts and not syntax. The reason we have the Books Online with SQL Server is because you shouldn’t have to memorize every possible command along with all their parameters. Want to know how to write a MERGE query? Look it up. What does a MERGE query do? That you ought to know. I think concepts are important. Questions about the recovery models within SQL Server aren’t trivia about the system, they’re trying to get to your understanding of how point in time recovery works.

I don’t really like posting interview questions. And most of the time when I’ve seen interview questions posted (even mine), they’re pretty trivial stuff that doesn’t really get to whether or not the person you’re trying to hire is a good fit for the position and your team. I also don’t like posting interview questions because some people will try to use them to study up and attempt to BS their way into a position they frankly don’t deserve and haven’t earned. SQL Server knowledge and experience comes from using it to solve problems out in the world and protecting the information generated by a business.

That’s why I love this question. And I don’t mind sharing it with you because you can’t really memorize an answer to it:

You get a call from one of the business people. They tell you that the database is running slow. What do you do?

This is completely and utterly open-ended. It can go anywhere. In fact, it’s going to go where you lead it. For example, you could say “I first look at the Windows server error logs.” OK, that’s fine (several people I’ve interviewed started there). What indications would you find there that the server is running slow or what would you find there to show why the server is running slow? Suddenly, maybe you don’t want to look at the error logs for the server any more, or maybe you do. But you get the idea. There is no single correct answer here. There are however, lots of very problematic paths, and I’m going to let you go down them. I had one guy insisting that the very first thing he needed to do after the phone call was take a look at the application code to see the method used to make the call to the database. We spent quite a bit of time exploring why this seemed to be the best approach to him. Was it? I’m not saying. No hints on this one. Your answer for this question, is your answer, and that’s why I love it.

Further, as we explore this question, and I’ve spent anywhere from 10 minutes up to an hour working on it as part of an interview, I’m also getting to see how you deal with problematic situations, what your logic chain looks like, what your understanding of SQL Server is, and, most importantly, how you fit into the team. Because with an open-ended question like this, we get to talk. We’re way beyond silly trivia contests now.

Before you think this is unfair to people who aren’t performance experts, fine, let’s talk about what happens when you get an alert that the server is offline. Not a systems person? OK, we just got an alert that a database consistency check failed, now what? See, the point is to go on an adventure where we explore your knowledge and approach. I just have to work hard to make sure we stay somewhat on topic so that I can assess your knowledge and skill level.

Now, if I approach any of these questions and your response is to reject them out of hand, something I’ve run into, then we’re done. I’m not going to focus on trivia, which is how lots of people prep for interviews. I expect you to have concepts, process, logic, and methods available from your time studying and learning. So if we interview, be ready for this exploration, not a trivia contest. And the only way to really prepare is to get experience and knowledge by actually working with SQL Server.

Oh, and sometimes, I ask questions or make statements that are wrong. Sometimes it’s on purpose. Other times, it’s because I screwed up or was ignorant. But you can’t sit there agreeing with me. You better be paying attention because I might be testing you further.

This type of question is just too perfect for understanding how much you know about SQL Server.

Want to start to prepare for answering this kind of question? I’ve got an opportunity for you. At the PASS Summit 2012 this year, I’ll be running an all-day pre-conference seminar called Query Performance Tuning: Start to Finish. In it, I’ll cover quite a bit of what might make it possible for you to answer this question should you be presented it in an interview. No, I’m not guaranteeing you’ll answer it correctly. I’m just offering a chance to prepare. Sign up for the Summit today. There’s still a discount in place that can help you offset the cost of the seminar until the 30th of September.

Sep 07 2012

SQL Server vs. Oracle

Just so we’re clear, I use SQL Server. I like SQL Server. But, this doesn’t mean I have anything against Oracle. It’s fine. It’s good. But, I know very little about it. However, throughout my career I’ve found myself needing to understand it better. Either because I’m trying to train Oracle people to better use SQL Server and I need to be able to speak a little of their language to facilitate translation. Or, because I’m defending SQL Server on some technical point that the Oracle people don’t completely understand. Or, because I’ve said something stupid about Oracle in my ignorance.

Now, you know how busy you are, and I know how busy I am, so I doubt either of us has the time we really need to learn Oracle much. So, what do you do? Well, Red Gate Software, who straddles the worlds between Oracle & SQL Server like the Bifrost between Midgard & Asgard, has started a series of conversations between two people who know something about each platform, Jonathan Lewis (blog) and me.

We had our first conversation talking about clustered indexes. We covered how they work in both platforms (not that differently) and they’re used and abused. Interestingly enough, according to Jonathan, clustered indexes just aren’t used that much within Oracle, despite the fact that they really do behave mostly the same way as they do within SQL Server, where we use them on most every table (or at least so I maintain you should). It was a great discussion (NOTE: not a fight, no one was nasty or mean, we talked).

We’re going to have another discussion. We’re going to be talking about temporary tables. Again, I don’t know much about Oracle, so please, this is not an attack, but apparently they don’t have the same concept of temporary tables as we do in SQL Server. We’re going to cover a lot of the myths and misperceptions surrounding temp tables on both Oracle and SQL Server, how they work and how they affect performance. I learned a lot during the last conversation and I don’t doubt I’ll learn a lot during this one. If you’re interested, please go to this web page and register.

And, I’d be remiss if I didn’t mention again, if you like learning about performance in SQL Server that you should consider attending the PASS Summit 2012. If you register now, you save $500, which is just enough to pay for my pre-conference seminar, Query Performance Tuning: Start to Finish. I’ll be covering all aspects of performance tuning from gathering metrics to understand which queries are running slow, to reading execution plans to understand why, to addressing the issues to fix the performance and make your queries hum. Please consider taking part. It’ll be a lot of fun and I’ll try like crazy to make it useful.

 

Aug 22 2012

24 Hours of PASS, Fall 2012

It’s time to get your learn on again. The schedule for the Fall 24 Hours of PASS is up and ready for registration. This is the Summit preview session, so many (most, all) of the speakers are showing off some of what you can learn at their sessions at the PASS Summit 2012 itself. It looks like a pretty exciting bunch of topics given by some of the best professionals in the industry.

I’ll be presenting Three Ways to Identify Slow Running Queries on September 20th, 1400 GMT. This is just a sub-set of the information that I’ll be presenting during my all day pre-conference seminar, Query Performance Tuning: Start to Finish. The full seminar I talk about how to measure the performance of your systems, identify which queries are causing you the most trouble, figure out what that trouble is, and show how to fix those queries. This session of 24HOP, I just focus down on three methods you can use, right now, to understand the most costly queries on your servers. If you want to know what to do about them, well, you’ll have to register for the seminar.

A lot of this information is derived from the new edition of my book, SQL Server 2012 Query Performance Tuning. So you can check that out too if you’re so inclined.

Jul 02 2012

Querying Information from the Plan Cache, Simplified

One of the great things about the Dynamic Management Objects (DMOs) that expose the information in plan cache is that, by their very nature, they can be queried. The plans exposed are in XML format, so you can run XQuery against them to pull out interesting information.

For example, what if you wanted to see all the plans in cache that had a Timeout as the reason for early termination from the optimizer? It’d be great way to see which of your plans were less than reliable. You could so like this:

WITH XMLNAMESPACES(DEFAULT N'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),  QueryPlans 
AS  ( 
SELECT  RelOp.pln.value(N'@StatementOptmEarlyAbortReason', N'varchar(50)') AS TerminationReason, 
        RelOp.pln.value(N'@StatementOptmLevel', N'varchar(50)') AS OptimizationLevel, 
        --dest.text, 
        SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1, 
                  (deqs.statement_end_offset - deqs.statement_start_offset) 
                  / 2 + 1) AS StatementText, 
        deqp.query_plan, 
        deqp.dbid, 
        deqs.execution_count, 
        deqs.total_elapsed_time, 
        deqs.total_logical_reads, 
        deqs.total_logical_writes 
FROM    sys.dm_exec_query_stats AS deqs 
        CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest 
        CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp 
        CROSS APPLY deqp.query_plan.nodes(N'//StmtSimple') RelOp (pln) 
WHERE   deqs.statement_end_offset > -1        
)   
SELECT  DB_NAME(qp.dbid), 
        * 
FROM    QueryPlans AS qp 
WHERE   qp.TerminationReason = 'Timeout' 
ORDER BY qp.execution_count DESC ;

 

I posted a similar version of this query once before (although, I think that one is a little broken). It works fine… But…

This query takes 25 seconds. A big chunk of that is parsing the XML. What if, for a simple query like there, where I’m not doing a lot of conversion & processing with the XML, we ignored it and went instead to something like this:

SELECT  DB_NAME(deqp.dbid), 
        SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1, 
                  (CASE deqs.statement_end_offset 
                     WHEN -1 THEN DATALENGTH(dest.text) 
                     ELSE deqs.statement_end_offset 
                   END - deqs.statement_start_offset) / 2 + 1) AS StatementText, 
        deqs.statement_end_offset, 
        deqs.statement_start_offset, 
        deqp.query_plan, 
        deqs.execution_count, 
        deqs.total_elapsed_time, 
        deqs.total_logical_reads, 
        deqs.total_logical_writes 
FROM    sys.dm_exec_query_stats AS deqs 
        CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp 
        CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest 
WHERE   CAST(deqp.query_plan AS NVARCHAR(MAX)) LIKE '%StatementOptmEarlyAbortReason="TimeOut"%';

 

Now, we’re no longer hooked into getting the XML parsed. But, surprisingly, performance is not much better, sometimes worse in my tests. It probably has something to do with performing a function on a column, the CAST of the query_plan column from XML to NVARCHAR(MAX). What can you do?

Well, there is one other place where execution plans are kept, sys.dm_exec_text_query_plan. Things are a little different in there. Instead of a plan with multiple statements in it, each of these plans is for an individual statement. This is why you must pass in the start & end offsets to call the query. That changes the result sets, a little. You get fewer rows back, but, you also get a lot less duplication, and, we don’t have to cast anything in the WHERE clause. Let’s check it out:

SELECT  DB_NAME(detqp.dbid), 
        SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1, 
                  (CASE deqs.statement_end_offset 
                     WHEN -1 THEN DATALENGTH(dest.text) 
                     ELSE deqs.statement_end_offset 
                   END - deqs.statement_start_offset) / 2 + 1) AS StatementText, 
        CAST(detqp.query_plan AS XML), 
        deqs.execution_count, 
        deqs.total_elapsed_time, 
        deqs.total_logical_reads, 
        deqs.total_logical_writes 
FROM    sys.dm_exec_query_stats AS deqs 
        CROSS APPLY sys.dm_exec_text_query_plan(deqs.plan_handle, 
                                                deqs.statement_start_offset, 
                                                deqs.statement_end_offset) AS detqp 
        CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest 
WHERE   detqp.query_plan LIKE '%StatementOptmEarlyAbortReason="TimeOut"%';

 

Performance on my system dropped from 30 seconds on average to 8 seconds on average. That’s a win by any measure. If you worked on a way to eliminate that wild card LIKE search, it would be even better. Note line 7 above. To be able to click on the query_plan column and see a pretty graphical execution plan, I just have to CAST the text to XML, but that’s not adding to the overhead of the query.

If you’re looking to search within your query plans, you’re still likely to be better off using XQuery to get sophisticated searches on the data, but for really simple stuff, using the sys.dm_exec_text_query_plan may offer you a faster alternative.

Jun 05 2012

Guest Blog

I was given the opportunity to put together a guest blog post for the MVP blog. I did a little something on determining whether or not you have high memory use through the use of a DMO. Check it out.

Jun 04 2012

How to Drop One Plan from Cache

While presenting this weekend at SQL Saturday #117 in Columbus, OH (great event, if you missed it, you missed it), I had what I thought was a little piece of throw-away code, but several people from the audience asked about it. Here it is:

DBCC FREEPROCCACHE(0x05000700618F532C40E190CE000000000000000000000000) ;

Not much to it is there?

The trick is, starting with SQL Server 2008, you can use the FREEPROCCACHE command to drop a single plan from the cache rather than completely clearing out the cache. I use it to show compile times & bad parameter sniffing and other things. You can use it to get rid of a plan in cache for whatever you might need to do that. You certainly don’t need to drop the entire procedure cache as people so frequently do. The only trick to using this is that you need to get the plan handle, that long, meaningless string inside the parentheses above. You can do that using this query (or several others):

SELECT  decp.plan_handle
FROM    sys.dm_exec_cached_plans AS decp
        CROSS APPLY sys.dm_exec_sql_text(decp.plan_handle) AS dest
WHERE   dest.[text] LIKE 'CREATE PROC dbo.spAddressByCity%';

I’m joining the plans from cache that are displayed through sys.dm_exec_cached_plans to the query text through sys.dm_exec_sql_text and, in this case, searching for the CREATE PROCEDURE statement to find the one I’m interested in. That’s a quick & dirty way to get the job done. Simple stuff, but hopefully helpful.

May 31 2012

Never, Ever Use Clustered Indexes

This whole concept of the clustered index as a foundational structure within SQL Server is just plain nuts. Sure, I get the concept that if a table has a clustered index, then that index actually becomes the table. When you create a clustered index on a table, the data is now stored at the leaf level of the Balanced Tree (b-tree) page distribution for that index, and I understand that retrieving the data using a seek on that index is going be extremely fast because no additional reads are necessary. Unlike what would happen with a non-clustered index on a heap table.

Yes, I get that if I store my data in a heap, the only way to access the data is through the Index Allocation Mapping (IAM)  pages that define extents and this means that I don’t get the double-linked list of pages that occur within clustered indexes. I know that having to read the IAM leads to additional reads for a heap to look up within the IAM in order to find the locations of the data on the disk.

I realize that updating or deleting a clustered index is helped by being able to use the index itself to find the exact row that needs to be modified or removed. I’ve also seen the tests that show that clustered indexes work faster on inserts in the overwhelming majority of situations within SQL Server. But I still want you to stop using clustered indexes on all your tables within SQL Server. Why? Because that’s how Oracle databases are mostly designed.

I hope you’ve figured out by now that I’m joking about tossing out clustered indexes within your SQL Server databases. I do believe that, unless you have a very thoroughly tested exception, every table within SQL Server should have a clustered index for some of the reasons that I’ve listed above, as well as several others. But Oracle DBAs design their systems differently.

When I see a vendor that makes a product that is exactly the same on Oracle, SQL Server, and possibly DB2 or MySQL, I have to ask myself, just how well is that system going to perform. When I hear someone tell me to design the system using lowest common denominator T-SQL because “we don’t want to be locked into a particular vendor” I have to wonder, again, how are we going to make this system perform. Because, if Oracle likes heaps, but SQL Server likes clusters, how do you design for both? I’d say you can’t.

In fact, I’d argue that you need to design precisely for specific relational database management systems because, let’s face it, they don’t implement the fundamentals in the same way. If you mess up the fundamentals, you’ve just messed up your entire design.

Apr 30 2012

Changing DB_CHAIN Can Clear the Plan Cache

If you make changes to the settings of a database, it can cause the procedure cache to be cleared. Microsoft has documented changes that cause this for all procs within a database (scroll down to just above the examples). But guess what, if you change the DB_CHAINING option, it clears the cache too. Here’s a sample script to show it in action.

ALTER DATABASE Testing SET DB_CHAINING OFF; 
GO

CREATE PROCEDURE x 
AS 
    SELECT    * 
    FROM    test.dbo.A AS a2; 
GO

CREATE PROCEDURE y 
AS 
    SELECT    * 
    FROM    dbo.Table_1 AS t; 
GO

EXEC dbo.x;

EXEC dbo.y;

SELECT    deqs.creation_time 
FROM    sys.dm_exec_query_stats AS deqs 
        CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest 
WHERE    dest.text LIKE 'CREATE PROCEDURE x%' 
        OR dest.text LIKE 'CREATE PROCEDURE y%';

ALTER DATABASE Testing SET DB_CHAINING ON;

SELECT    deqs.creation_time 
FROM    sys.dm_exec_query_stats AS deqs 
        CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest 
WHERE    dest.text LIKE 'CREATE PROCEDURE x%' 
        OR dest.text LIKE 'CREATE PROCEDURE y%';

ALTER DATABASE Testing SET DB_CHAINING OFF;

The script is almost self-explanatory. I want to point out that I put in one cross-database query to imply the possibility of cross-database ownership or access, but also to show that regardless of what’s referenced, all queries from this database are flushed from cache.

The first of the simple DMO queries returns two rows, the second returns no rows because everything is out of the cache because of the change to the database. It’s a little thing, but since it wasn’t explicitly stated in the Microsoft documentation, I thought I’d toss this out there.