Dec 13 2010

Life/Work Balance

Apple iPad - Work Life Balance ToolTechnology, especially information technology, is the greatest thing to ever happen to mankind, freeing us from toil and drudgery. Technology, especially information technology, is a pernicious evil taking over our lives forcing us to work harder and longer. Depending on the time of day, the day of the week, my mood, my wife’s mood, or the direction the wind is blowing, either of these statements could be true.

The fact is, I love technology and I do have to wrestle with keeping it from taking over my life, but only because I have so much fun with the toys that technology brings. You want to know how much I love toys, ask me about my Droid sometime. Pull up a chair. We’re going to be here a while. The trick is, finding that sweet spot, where you use the tools presented to you in order to enhance your life while enhancing your work. Just enough of each and you can be a hero at home and on the job and have a blast doing both.

The one thing I really hate about being a DBA is being on call. I’m not sure why but most systems fail one of three times, right when I’m going to sleep, so I get to stay up another 1-3 hours fixing the issue; around 3AM, so I can spend about 1/2 an hour figuring out how to log into the network before I spend 1-3 hours fixing the issue; or, when I’m half way up a mountain with the Scouts, in which case, I just have to call the boss and get someone else engaged (and yes, I do prefer these last failures to the others). The real trick here is, to get your systems set up so that you don’t have constant emergencies, regardless of the time of day. How do you do this? Proactive monitoring.

Red Gate handed me 10 iPad’s along with 10 licenses for SQL Monitor, their new monitoring tool. I’m to give these 10 devices away to the best responses in the comment section of this post to the question I’m going to put to you shortly. That’s right, you can get out in front of the issues you’re running into and avoid whenever it is that you get called from work and get an awesome toy at the same time.

The goal is life/work balance. Notice which one I put first. That’s the priority. Here’s your question:

What do you think the most common cause of server outages is, why, and how would being able to monitor your systems remotely help solve this issue, thereby improving the quality of your life?

The contest runs from now until 11:59 PM, December 17th, 2010. Please reply below, but keep it pithy. Don’t publish your version of War & Peace in the comments (I might delete it). I’m the sole judge and arbiter (which means, I probably will delete anything resembling War & Peace). One entry only. Make sure there’s a means of contacting you in the post, or I’ll give your iPad to someone else. Remember, pithy is our watch word. You can answer this question in three well constructed sentences. If you win, I’ll want to get a picture of you using the iPad to monitor your systems remotely. Plan on sending me that picture by January 31st. An interesting picture. Something with you sitting in your cube at work just won’t fly.

That’s it. I’ll announce the winners in a new post on the blog at the end of the week. Here are the official rules:

  1. The contest is open to professionals with SQL Server monitoring responsibility. Entrants must be 18 years old or over.
  2. Entries must be received by Friday, December 17, 2010. The contest organizers accept no responsibility for corrupted or delayed entries.
  3. Employees of Red Gate, the contest organizers and their family members are not eligible to participate in the contest.
  4. Entries are limited to one per person across the three simultaneous contests hosted on SQLServerCentral.Com, BrentOzar.Com, and ScaryDba.Com.
  5. The organizers reserve the right, within their sole discretion, to disqualify nominations.
  6. The organizers’ decisions are final.
  7. Red Gate Software and those involved in the organization, promotion, and operation of the contest and in the awarding of prizes explicitly make no representations or warranties whatsoever as to the quality, suitability, merchantability, or fitness for a particular purpose of the prizes awarded and they hereby disclaim all liability for any loss or damage of any kind, including personal injury, suffered while participating in the contest or utilizing any prizes awarded. 
Jul 21 2010

SQL University: Introduction to Indexes, Part the Second

Welcome once more to the Miskatonic branch of SQL University. Please try to concentrate. I realize the whipoorwills singing outside the window in a coordinated fashion that sounds almost like laboured breathing can be distracting, but we’re talking about indexes here.

We left last class with a general idea what an index is, now it’s time for some specifics. There are several different kinds of indexes, as we talked about last class. But the two you’re probably going to work with the most are clustered, non-clustered. Each of these indexes is stored in a structure called a B-Tree, a balanced tree, not a binary tree. That’s a very important distinction.

A B-Tree is a double-linked list that is defined by the keys of the indexes on the top and intermediate pages, and at the leaf level by the data itself in the case of clustered indexes. Some of you no doubt think I’m quoting from De Vermis Mysteriis. Basically, for our purposes, a B-Tree consists of a series of pages. There is a top page, or root page, that defines the beginning of the index key. It points to a series of intermediate pages. Each intermediate page contains a range, a previous and a next value. These all point to each other, hence, double linked. The idea is that SQL Server can quickly identify which intermediate page has the pointers down to the leaf node, the final node in the stack. The values of these pointers are defined by the key of the index, the column or columns that you define when you create the index. There are always at least two levels, leaf & root, but there can be more, depending on the amount of data and the size of the keys. Just remember, the size of the key, which refers both to the data types in the key and the number of columns, determines how many key values can get on a page, the more key values on a page, the faster access will be, the fewer key values, the more pages that have to be read, and therefore, the slower the performance.

In general the purpose is to be able to quickly navigate to a leaf or set of leaf pages. When a B-Tree is used and the query engine is able to navigate quickly down to the leaf needed, that is an index seek. But when the B-Tree has to be moved through, in whole or in part, scanning for the values, you’re looking at an index scan. Obviously, in most cases, a seek will be faster than a scan becuase it’s going to be accessing fewer pages to get to the leaf needed to satsify the query. Just remember, that’s not always true.

Let’s get on to the indexes. It’s already been mentioned, but it bears repeating, the principle difference between a clustered and non-clustered index is what is at the leaf level. In a non-clustered index, it’s simply the key values and an values added through the use of the INCLUDE option along with a lookup value to either the clustered index key or an identifier within a table. In a clustered index, the data is stored down at the leaf. This is why people will frequently refer to a clustered index as being “better” than a non-clustered index, because you’re always going directly to the data when you’re looking information up within a clustered index. But, as with the scans vs. seek argument, this is not always true either.

I mentioned that a non-clustered index refers back to the clustered index, if there is one on the table. Because the data is stored at the leaf level of the clustered index, when you need to retreive other columns after performing a seek on a non-clustered index, you must go and get those columns from the clustered index. This is known as a key lookup, or in older parlance, a bookmark lookup. This operation is necessary when data not supplied by the non-clustered index, but can be very expensive because you’ve just added extra reads to your query.

What if there isn’t a clustered index on the table? What does the non-clustered index use to find other columns? If the table doesn’t have a clustered index, then that table is referred to as a heap. It’s called a heap because the data is simply stored in a pile, with no logical or physical ordering whatsoever. With a heap, SQL Server takes it on itself to identify the leaf level storage and creates a row id value for all the rows in the table. This row id can be used by the non-clustered index to find the data. That is referred to by the completely arcane and incomprehensible term, row id lookup. You might be thinking, hey, that means I don’t have to create a clustered index because SQL Server will create one for me. You’d be wrong. Maintaining the row id is an expensive operation  and it doesn’t help in retrieving the data in an efficient manner. It’s just necessary for SQL Server to get the data back at all. In general, this is something to be avoided.

A non-clustered index doesn’t necessarily have to perform a lookup. If all the columns referred to in a query are stored within a non-clustered index, either as part of the key or as INCLUDE columns at the leaf, it’s possible to get what is called a “covering” query. This is a query where no lookup is needed. Indexes that can provide a covering query everything it needs are referred to as covering indexes. A covering query is frequently one of the fastest ways to get at data. This is because, again, depending on the size of the keys and any INCLUDE columns, a non-clustered index will have more information stored on the page than a clustered index will and so fewer pages will have to be read, making the operation faster.

By and large, a good guideline is to put a clustered index on all tables. SQL Server works extremely well with clustered indexes, and it provides you with a good access mechanism to your data. If you don’t put a clustered index on the table, SQL Server will create and maintain a row ID anyway, but as I said before, this doesn’t save much work on the server and it doesn’t provide you with any performance enhancement.

That’s a basic introduction to the three concepts of the clustered index, the non-clustered index and the heap. The points I’d like you to remember are:

  • Indexes are stored in Balanced Trees
  • Balanced Trees have, generally, three levels, root page, intermediate page, and leaf page
  • In clustered indexes, data is stored at the leaf page
  • In non-clustered indexes, a pointer is maintained back to the clustered index or the row id
  • A heap is a table without a clustered index

Remember those things and you can really begin to dig down on how indexes work. Understanding how they work will assist you in designing them for your database and your queries.

Next class we’ll go over statistics.

I wouldn’t walk back to your dorm by way of the shore. I’ve seen some rather odd looking people near the docks lately that didn’t give me a good feeling. See you next time… maybe.

Jul 19 2010

SQL University: Introduction to Indexes, Part the First

Right, all eldritch tomes are to be closed and Elder Signs are to be put away during this course.

Welcome to the History department here at the Miskatonic branch of SQL University. Why the History department? Well, first, because I like history and have frequently thought I would enjoy teaching it. Second, because I needed a hook upon which to hang part of the story I want to tell. What story is that you ask? Why, the story of the Dewey Decimal System. We are interested in studying history and historians must classify our subjects carefully. For advanced students we’ll be covering the Library of Congress Classification System and the…

Right, I give, this is the introductory class on indexes. If you thought we were covering something exciting and sexy like PowerShell, you’re in the wrong room.

Indexes… indexes…. There are, of course, different kinds of indexes. I’m sure that some of you, glancing ahead in your books, are already thinking, “yeah, two.” And you would, of course, be ABSOLUTELY WRONG! That’s why you’re in this class, because you don’t know. There are a large number of different kinds of indexes. Most people think of the standard indexes, of which there are two, clustered and non-clustered. But when pressed they can usually come up with the Full-Text index and possibly even the XML index. But that leaves out Spatial indexes, filtered indexes… more. Microsoft’s documentation lists eight different indexes:

  • clustered
  • non-clustered
  • unique
  • indexes with included columns
  • Full-Text
  • Spatial
  • Filtered
  • XML

But I’ve seen other people count them other ways and arrive at varying amounts. Is a compound index a different kind of index? If it’s not, is unique really a different kind of index? Things to think about.

Why so many? What the heck is an index good for? They must be useful critters or Microsoft wouldn’t have put so many different sorts (however many that is) into SQL Server. I started off talking about the Dewey Decimal System for a reason. An index, any of the indexes we’re going to talk about, is primarily meant, like the DDS, as a mechanism to make finding things easier. That’s all it is. Pretty simple, right? Wrong. You clearly haven’t spent time with SQL Server indexes or the DDS. It’s really complicated. But, just like the DDS, learning how indexes work will make using them much easier.

Remember, the main purpose of a database, despite what your DBA may secretly feel in his or her heart, is not to keep, store and protect data. No, the main purpose of a database is to feed that data up to your business users, whoever they may be, in a timely and accurate fashion. That’s where indexes come in. They will help your queries get the data out to your users faster. Think about your data like a really huge library and your data like a bunch of books. The index acts like the DDS as a mechanism to speed you through the library and quickly and easily retrieve the book that you want.

Enough comparisons, since this is introductory, I just wanted to get the idea of indexes into your head. In the next installment I’ll take on two (or four, depends on how you count them) different kinds of indexes, starting with the standard two that you expected me to cover, clustered and non-clustered indexes. I’ll also introduce the concept of a heap and we’ll talk about what the heck a B-Tree is.

See you next class, probably. Be careful crossing the quad, I’ve heard Wilbur Whately is back on campus and we all remember what happened last time.

Feb 15 2010

nHibernate Database, First Look

I’m getting my first look at a full-fledged nHibernate database developed by consultants for our company. I thought I’d share my initial impressions. I’ll be capturing trace events from the database over the next couple of weeks, so I’ll be following up on the behavior of nHibernate within this database as well.

The first thing I saw & thought was, “Foreign key constraints. Thank the gods.” That really is good news. I was frankly concerned that they might go with the “let the code handle it” approach. There are quite a few null columns. I’m also seeing tons & tons of nvarchar(255) which must the default string size. Lots of bit fields too. They also used bigint in a lot of places too. None of this is definitively good or bad, just observations.

There are tables that lack a primary key. That raises a bit of a concern. The primary keys I’ve looked at have all been clustered, which isn’t too problematic since that’s probably the primary access path. There are a few unique constraints on some of the tables too.

Overall though, I don’t see anything that, at first glance, makes me want to run screaming from the room (or pick up a big stick & start looking for developers).  The devil is no doubt in the details. Time to get started looking at the trace events.

Dec 01 2009

How do -You- use SQL Server

I’ve been tagged by a misplaced yankee, uh, New Englander, whatever. The question is, how do I/we use SQL Server where I work. That’s a tough one. It would make a much shorter, and easier, blog post to describe the things we don’t use it for. However, keeping with the spirit of these tags, I’ll try to lay out it.

For those that don’t know, I work for a rather large insurance company. This means that we have lots and lots of databases, but not much data. We also are cheap. That means we’ll run an app into the ground rather than spend the money & time to replace it. We have apps still running from the 70′s and 80′s propped up by ancient men with pocket protectors, spit, bailing wire and happy thoughts. This also means that we have apps running on SQL Server 7, 2000, 2005 and 2008. Give me a couple of weeks and I’m sure I can get an R2 app deployed. There is also a few Oracle databases, our warehouse and Peoplesoft in particular. We even have a DB2 and, I think, one Sybase database somewhere.

I don’t want to go into lots of details about the type of data we store, lest I get in trouble, but basically, think insurance and you’ll get a pretty good idea of a lot of it. Add in the fact that my company prides itself on engineering to avoid risk and you’ll know that we gather quite a bit of information about the things that we insure. There are lots and lots of very small databases. Our largest db’s are just breaking 100gb, but must are in the 20-60gb range. We have a ton of OLTP systems gathering all the different data. These have been developed in just about every possible way. We even have a couple of systems using nHibernate under development. We move, mostly, pretty standard structured data. We have a few processes that are using XML, mostly from third party sources, to import data, so we’ve learned how to shred that stuff into the database. Spatial data, insurance remember, is the big new thing on the block. We’re seeing lots more implementations taking advantage of this. We don’t see much in the way of unstructured data, but some of the reports from the engineers falls into this realm. We also get quite a few photo’s from them that want us to store. We’re working on using FileStream to keep those in sync with the db rather than storing them within the database itself.

Currently, and I hate this, the overwhelming majority of our OLTP data is collected in SQL Server. All our datamarts used for reporting are in SQL Server. BUT, in the middle sits our Oracle warehouse. So we have to convert our data from SQL Server into Oracle and then back into SQL Server. It’s fun. Swapping data types over & over, shrinking column names only to re-expand them into something a human being can read… It’s lots of fun.

We use SSIS for almost all our ETL processes, although we have a few DTS packages still chugging away on some of the 2000 servers. We’re running a bit of replication, but nothing major. We have several fail-over clusters in production. We’re running virtual machines in lots of places. We’re expanding our Reporting Services implementation pretty rapidly. After attending Buck Woody’s sessions at PASS this year we’re getting Central Management Servers and Policy Based Management implemented.

Most of my time is spent working with development teams. We do most our deployment work using Visual Studio. I do database design, stored procedure writing and tuning, data access methods… you name it and I’m involved with it. The level of our involvement varies from team to team, but by & large, we get involved early in most development projects and are able to influence how databases are developed.

For monitoring we’re using Microsoft System Center Operations Manager (or whatever it’s called this week). We’ve modified it somewhat, adding special rules & monitors to keep an eye on the system better. We also use Idera’s SQL Diagnostic Manager to help keep a more detailed eye on the systems. I already mentioned that we use Visual Studio for most of our development & deployments. We also use several Red Gate products, Compare, Data Compare, Prompt, pretty much every day.

That’s pretty much it. We keep busy and keep the systems up, running & happy most of the time.

Because I think they’ll have interesting answers, I’m going to pass this on to Jeremiah Peschka and Tim Ford.

Sep 15 2009

More Free Training

Quest Connect 2009, taking place in October 21 for 24 hours, looks like it’s going to have 64 different sessions, live and recorded, by a variety of the names in the industry. It’s another chance to dig in and learn the details on a variety of topics from some of the top names in the business. Can you say Tom LaRock? How about Tim Ford? I know you want to hear from Brent Ozar. Those are just some of the featured speakers. There are a whole slew of others, it’s worth pursuing, and did I mention, the price is right.

I recorded a session for them last night. It’s on the basics of understanding execution plans.

Sep 14 2009

Editorial on SQL Server Central

My first one over there. It’s discussing whether or not you should do two things, build your own monitoring tool, come out in particular favor of one tool or suite of tools from a single vendor. Please read it and watch the video. And, even more importantly, leave a comment in the discussion.

Sep 09 2009

Spools in Execution Plans

I got the question the other day, when are you likely to see a spool in an execution plan? Easy, whenever SQL Server needs to walk through the data multiple times, usually in JOIN operations… Yeah, well, once again, my flip answers are not quite the entire story.

Spool operations are temporary storage of the data for later reuse in a query plan. There are two types of spool operations, eager spool and lazy spool. A spool is basically a temporary table created within the execution of the query that is used when it’s likely that data will be needed again, and again during the execution of the query. This is not an explicit #temp temporary table, but a work table for operations within the processing necessary for a given query’s behavior. A spool is created when the optimizer thinks that it can work better with a semi-permanent sub-set of data rather than have to perform multiple seeks or scans against a table or index or in other places where data re-use is important (more in a bit).

So how does this work? Take a look at a simple query:

UPDATE Person.Person
SET FirstName = 'Ted'
WHERE FirstName = 'Ted';

When the execution plan for this query is generated, it looks like this:

EagerSpool

In this case, an eager spool is used as part of the roll back mechanism and to prevent the Halloween scenario. An eager spool is one where the data is retrieved immediately.

It’s possible to see the other type of spool in a query that looks like this (straight out of the Books Online):

WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS
(
    SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
    FROM HumanResources.Employee
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
    FROM HumanResources.Employee e
        INNER JOIN DirectReports d
        ON e.ManagerID = d.EmployeeID
)
SELECT ManagerID, EmployeeID, EmployeeLevel
FROM DirectReports ;

Which would result in this execution plan:

LazySpool

Now you see a table spool that is called a lazy spool. This means that it only loads data as the data is requested. This makes a lot of sense because the lazy spool is operating as the means for gathering the recursive data together. So it’s not going to go and get all the data available, like an eager spool. Instead it’s going to only load the data as needed, lazy.

These two scenarios are much more likely than the typical join to show a table spool. Yes, it can, and does, appear in join operations, but as I said at the beginning, that’s such a flip answer. Much better to try to be complete.

Jun 08 2009

Microsoft SQL Server Premier Field Engineers

Joe Sack has started a new team blog for the Microsoft SQL Server Premier Field Engineers. If you don’t know who they are, you should. The first post is just introductory, but this blog is likely to become a great resource. These are the guys that MS zip lines into tough situations with the expectations that they’ll improve them. I’d strongly suspect these are fellows worth listening to.

Mar 30 2009

Reading to Learn

I just finished chapter 1 of Alastair Aitchison’snew book on SQL Server spatial data, “Beginning Spatial with SQL Server 2008.” If this is the beginners book… oh boy. The advanced book must be insane. Seriously though, Mr. Aitchison seems to have written a fantastic book. I’m going to tear through it as fast as I can because I’ve got two projects that are looking to start using spatial data and quite frankly, I’m a bit lost.

There’s a great discussiongoing on over at SSC as to the worth of technical books for DBA’s. It’s based on this editorialby Tony Davis. I’m surprised by the number of people who say they don’t use books. It seems that a lot more people use blogs and articles and discussion groups to learn. Maybe I’m showing my age a bit, but I don’t think a blog post or an article is going to get the depth and knowledge that Mr. Aitchison is displaying in this book. I know I’m regularly opening Kalen Delaney’s Inside SQL Server 2005 (and the new one for 2008 just came out) to look up bits & pieces of information that just isn’t as readily available on the web. Also, it’s worth pointing out, except for the editing that comes from people who read this blog, no technical review is done of this information. I might be right about the things I post, but I could be VERY wrong. Same with any other blog you read, including blogs by the big names. Despite the errors that creep into books (and trust me, they do), books are very carefully scrutinized by multiple sets of eyes to try to catch those errors prior to publication. They miss some, but they try not to miss any. Few blogs are like that. Not that many technical publications are terribly strict about technical accuracy either. I generally find more good information in the right books than anywhere else.

End of rant. I need to get back to reading this excellent book.