Oct 17 2011

Extended Events Data

I’ve been working quite a bit over the last week or so with extended events in Denali. The sheer magnitude of what you can do with extended events is just becoming clear to me. The interesting thing though is how much the basics are similar to trace. Similar mind you, not the same. For example, the best way to gather trace data is to output it to a file and then read the file into a table for later querying. It’s the same with extended events. There’s even a function that acts as a table:

SELECT *
FROM sys.fn_xe_file_target_read_file ('C:\APath\Query Performance Tuning*.xel', NULL, NULL, NULL);

This can take advantage of roll-over files just like the old function used for traces. You can also provide offsets to read a sub-set of the file, which is cool. The event fields come back within an XML column. Here’s a sample of how I’m dealing with that from the new version of my Query Performance Tuning book:

WITH xEvents AS
(SELECT object_name AS xEventName,
CAST (event_data AS xml) AS xEventData
FROM sys.fn_xe_file_target_read_file
('C:\Apath\Query Performance Tuning*.xel', NULL, NULL, NULL))

SELECT xEventName,
xEventData.value('(/event/data[@name=''duration'']/value)[1]','bigint') Duration,
xEventData.value('(/event/data[@name=''physical_reads'']/value)[1]','bigint') PhysicalReads,
xEventData.value('(/event/data[@name=''logical_reads'']/value)[1]','bigint') LogicalReads,
xEventData.value('(/event/data[@name=''cpu_time'']/value)[1]','bigint') CpuTime,
CASE xEventName WHEN 'sql_batch_completed' THEN
xEventData.value('(/event/data[@name=''batch_text'']/value)[1]','varchar(max)')
WHEN 'rpc_completed' THEN
xEventData.value('(/event/data[@name=''statement'']/value)[1]','varchar(max)')
END AS SQLText,
xEventData.value('(/event/data[@name=''query_plan_hash'']/value)[1]','binary(8)') QueryPlanHash
FROM xEvents;

yeah, XQuery isn’t my strong point. However, you get the idea. Oh, and the CASE statement is necessary because the rpc_complete code is visible in the statement field, but the sql_batch_complete code is visible in the batch_text field. That’s just Microsoft keeping us on our toes.

Want to talk about this and other ways of gathering performance metrics? Please come and visit during SQL In The City:Los Angeles on October 28th.

Oct 05 2011

Extended Events and Performance Tuning Knowledge

I’m working on updating my book, Query Performance Tuning Distilled, so that it reflects the new things available in SQL Server vNext:Denali. I’m going through the first chapters that are all about gathering information about your systems. Performance tuning is all about building up knowledge of how the system is working in order to understand what you need to change in order to improve it. I’m surprised by how much hasn’t changed. But some of the changes are fundamental and huge. Let’s talk huge. Extended Events is huge.

Extended Events came out in SQL Server 2008, but very few people, myself included, paid much attention. Those who did found the implementation awkward and confusing. Only a few people persevered enough to discover just how powerful and amazing these things are. Which is why most anyone who wants to learn about extended events should plan on starting at one place, Jonathan Kehayias’ blog. Yeah, the Books Online help get you started, but Jonathan really makes it all take off.

But why is this huge? I’m not going into a low-level discussion here, because frankly, I’m incapable, for that go back to Jonathan. What I will say is that you have a mechanism that captures events at their source instead of after the fact. That makes it faster and causes less impact on your system. Further, you can put filters in place at the event level that make it so that you only capture the information you want, when you want it. In short, it’s about control, reduced impact and increased knowledge.

So why write about it now? Because Denali has finally provided a GUI that simplifies the process of setting up extended events. There’s a wizard for building extended events sessions, and a session management GUI that you can use to manage them on your own. Basically, you have a way to get past the confusing and odd TSQL implementation and start using these things.

Check it out:

image

It really does provide everything you need to get going with extended events… or does it? There’s still the issue of knowing how these things work and what they do. The GUI above enables you, but it doesn’t provide much guidance. For example, there’s a very enticing little action (stuff collected from events are called actions or fields) called sql_text. When setting up my first, simple, session (the one you see above), I saw that action and thought, ah-ha, there’s my collection point for getting the sql batch and the procedure call. But then, on testing, it didn’t work. After a plaintive call for help on Twitter (hash tag #sqlhelp for those who don’t know), Jonathan rode to my rescue and informed me that despite the enticing name, that’s not what I wanted.

Instead, each event has event fields that are simply a part of the event. Global Fields (actions, whatever, told you extended events were confusing), have an added overhead, so you have to think twice about using them. Plus, sql_text didn’t return any data for sql batch calls. No, instead, for the rpc_completed, I needed to read the field, statement. Excellent! All set. Well, no. There’s still another wrinkle. For sql_batch_completed there is no statement field. Instead, I need to collect sql_batch_text. Ah, good…. but wait, now I have two fields that get returned and I have to figure out which of these I use instead of just going to a single source like I used to with trace…grrr… Why does Microsoft hate me? And you?

Anyway, I’ll be weaving extended event sessions throughout the book. You can use these things for looking at waits, deadlocks, ooh, all kinds of stuff. I did say powerful and huge right. Keep an eye out for a few more blog posts.

Also, if you want to talk to me about gathering knowledge for your performance tuning efforts, then you should make your way to SQL In the City: Los Angeles. I’m going to be there with a slew of other MVPs on October 28th. It’s free. Click here now to register. My session is called Performance Tuning With Knowledge, and it’s all about gathering the information you need to make your performance tuning decisions.

May 04 2011

SQL Rally

ImSpeakingThe Rally is next week. It’s not too late to register. This is going to be a very solid event with excellent opportunities for learning and networking. If you’re on the fence about going, don’t be. You should attend.

I even have a few (somewhere near 10) slots open in my pre-conference seminar, Query Performance Tuning: Start to Finish. I’ve been working on hard on this presentation and showing pieces of it to various user groups around the country. The reception so far has been very good.

If you want a seven hour brain dump on the ins and outs of performance tuning your queries, I strongly recommend it. Since there are still openings, you can get in, but I’d act quickly. Not because they’re all likely to fill, but because, it takes place in one week’s time. You need to plan for the trip.

Once you’re in town, you should stay of course for the next two days. There are going to be tons of fun things to do, people to learn from, other to talk to, sessions to attend. You might not want to missing the lightening talks either. I’m going to do one called “Testing Your Backups: A Rant.” You won’t want to miss it. There are going to be a bunch of evening events too, so we’re really stretching out the cost of registration. There will be only one reason to go back to your hotel, and that’s to pass out in preparation for the next day.

Apr 14 2011

Performance Tuning: Start to Finish

ImSpeakingThe very first ever SQL Rally is taking place in a little less than four weeks in Orlando Florida. It’s going to be quite the event. There will be two full days of sessions on any number of topics. But before that all starts, there’s going to be a set of full day pre-conference seminars. These too are on a number of topics, but I’m hoping to draw your attention to just one, mine.

I’ve put together a seven hour session on query performance tuning. I’ve tried to make it as complete as I possibly can. I’m going to cover the whole process from collecting data on your machines to identify where problems may be, to understanding the optimizer so you know how things work, to reading execution plans so you can identify issues, to various methods of fixing all sorts of different performance problems. In short, performance tuning, start to finish. Here are the specific things that I hope to communicate to you with this seminar:

1. The ability to collect performance metrics on their servers as part of an overall query tuning methodology

2. The ability to generate execution plans from multiple sources in support of troubleshooting poorly performing queries

3. An understanding of how the optimizer works in support of writing better TSQL code as well as troubleshooting poorly performing queries

4. A working knowledge of DMVs that will help them identify and fix performance issues on their servers

5. The ability to address common query performance problems

That’s it. If you get all or part of these five topics, it’s a win. This is probably an low to intermediate level class. It’s not high level. If you’ve been teaching performance tuning to experts for years, you’re probably not going to get much out of this class. If you do want to prep for the class, I’d recommend getting a copy of my book, “SQL Server 2008 Query Performance Tuning Distilled.”

There’s a good chance this seminar will sell out. If you’re interested, please click here to register soon in order to ensure that you have a spot.

I hope to see you there.

Aug 11 2010

24 Hours of PASS: Summit Preview

Registration is open for the second 24 Hours of PASS this year. This one is going to be a preview of the Summit itself. So all the sessions are tied, in some manner, to sessions being given at the summit.Here’s a link to go and register.

I’m very excited to be able to say that I’ll be presenting in this 24HOP. One of my presentations at the Summit this year is Identifying and Fixing Performance Problems Using Execution Plans. It covers pretty much what it says, methods for fixing performance problems by exploring the information available within execution plans. But, how do you know you have a performance problem? That’s where my preview session comes in. Identifying Costly Queries will show you several ways to gather metrics on your system so that you can understand which queries are causing you the most pain. Once you know which queries need tuning, you can use execution plans to tune them. Whether you’ll be attending the PASS Summit or not, and whether or not you’ll go to my session once you’re there, I think this 24HOP session will be useful to help you understand where the pain points are within your own systems. I hope you’ll attend.

More importantly though, check out all the other great sessions. This is an excellent collection of presenters and presentations. For anyone who has ever said “PASS doesn’t do anything for me,” I want you especially to take a look at the amazing training opportunities being offered by PASS, for free. The volunteers that run PASS do amazing things and this is just one of them. Take advantage of this opportunity and, hopefully, recognize that PASS is doing things for you. This just barely scratches the surface of all that PASS offers.

Apr 16 2010

Confio Ignite: Part II

I’m continuing to evaluate Confio’s Ignite database monitoring tool. I’ve had it collecting data on a couple of production servers for about a week now. Based on what I’ve seen so far, it’s looking like a pretty good piece of software.

Breaking with the usual tradition, I’m going to talk about the things I’m not crazy about with the software, before I start singing its praises. The first thing, that I thought was missing, but is actually just hard to find, is the ability to look at the query information that Ignite collects, broken down by database. It looks like you should be able to get to it by looking at the Databases tab, but instead you have to first drill down into a time-period, then select specific databases within that time period, which will show you the queries by database. I know that in my environment, it’s this database listing of queries that is probably going to get used a lot. Tracking it down required help from the Confio team, which they quickly provided. They also showed me a way I could run queries to get this from their database so that I could create a report.

Speaking of reports, because they support Oracle & DB2 as well as SQL Server through their interface, no Reporting Services. OK, I recognize that I’m more than a bit Microsoft-centric when I say this, but I’d like to see SSRS reports so that I can manipulate things the way I want to. Again, not the end of the world, but it’s just something I don’t like. Because the data store is available though, I can get in there and create my own reports to my heart’s content and, if I like their report, I can always run a trace to capture the query and see how they built it so that I can build one of my own that mirrors it. I think they provide a mechanism for customizing reports by building some XML that will add it to the interface, which is a bit of a pain, but shows they’re on top of things.

One other thing that bothered me as I worked with Ignite is that, in the real time monitoring section, it was hard to find my way to the list of locks. It had the list, but it just wasn’t obvious how to get to them and this is something I’m used to looking at when I’m worrying about real time performance issues.

Right, enough talking about things I don’t like. See this? I love this:

That’s a break down, by database, by day, of the cumulative waits on the system. Yeah, that little pink database is causing all sorts of problems on one of my production systems. I actually already knew this was a problematic database, but I wasn’t that precisely aware of when and how it had issues. You can drill down to see the same thing for a given day:

And it’s not just the pretty pictures either, showing that most of our production load is in the morning, but that there are some odd spikes at midnight, 4AM & 7PM, but there’s data available behind the graph. If you drill down and hover over the graphs, pop-ups like this one appear:

(Names have been blacked out to protect my job)

And it’s the focus on wait times and types provided by the trending views and reports that make this a very strong tool. Ignite also collects pretty standard performance metrics, buffer cache hit ratio, memory paging, etc. And you can customize which of those metrics it collects and which ones you display, all on your own. But almost any decent monitoring tool does that. I use a tool that does that, and might even do that a bit better. No, what seperates this tool from the pack is that ability to identify the wait states and the query that is causing them. That’s what will get you excited when you’re using Ignite.

It also has a little ping mechanism that shows response time on the server, a helpful and friendly little reminder that all performance isn’t around what’s happening on SQL Server, but what’s happening across the enterprise.

Big chunks of the interface are customizable and you can add or drop things using the little icons on the right side of the picture above.

I can keep going with all the stuff that’s good, because it’s a long list, but I’ll summarize what I like so far. I like the focus on wait states, a lot. I like the focus on query performance. Between those two I feel like I’m getting the majority of what I need from the tool, and more than I get from other, similar, monitoring tools. The fact that it does all the other standard data collection on top feels like gravy. The problems I have with the software so far are either personal issues, Reporting Services, or fairly minor problems. I have a monitoring product running against the same servers as Ignite and I haven’t seen any impact from Ignite’s monitoring showing up there, so it looks like this is a benign monitoring software.

For my final installment, I’m going to look up the skirt of Ignite and see what the underlying data structure looks like. The ability to gather and present data is all well and good, but more important is the ability to take that data and make serious use of it through reports or by building other data structures out of it for long term storage and reporting.

Apr 06 2010

Confio Ignite: First Impressions

One of the national sponsors for SQL Satuday is Confio Software. Just because they sponsor SQL Saturday, it’s worth checking out their product, but they also host the speaker’s dinner at these events. So, having eaten off their dime, I felt obligated to take a look. I’m glad I did.

Confio Ignite is a monitoring software that keeps real-time and historical track of the performance of your SQL Server (and Oracle and DB2) database servers. It’s focus is on wait states and queues, a very common method for troubleshooting performance.

You can get a trial download from their web site to run for a couple of weeks yourself. Everyone’s first impression of a software, after the web site of course, is when go to do the install. The install routine for Ignite was very easy. It’s also a bit of a shock. It actually uses web pages for configuration. This makes sense when you consider that the software originally comes from Oracle and Unix systems. Anyway, despite the bit of weirdness of using a web page for configuring software, it’s a very easy install. Once the install is complete, it’s also really simple to add servers for the software to monitor.

I put it to work on a number of development servers to start (I’m not plugging some unknown software into Production without a shake-down. I may be stupid, but I’m not crazy). It’s agentless and it began running queries against the systems immediately. It’s storing the data to my local desktop. That data is available in real-time, a historical view, or a trending view, showing how performance is changing over time. It’s all through a web site, not dissimilar to Quest’s Foglight. You can see a list of servers, their immediate state, action choices and some other general info.

Clicking a server, after it’s had time to gather data, put’s you into the Trend screen. It’s laid out very nicely. You get to break down performance by SQL, Waits, Programs, Databases, Machines, or Users. Then it shows information by Total Wait, Average Wait or Typical Day. The chart gives you a visual representation of queries running against the system, by default numbered and assigned by an internal hashing system that Ignite uses (no, I don’t know what the internals look like, I’m inferring that from the information presented). Down at the bottom of the screen you can see Top Query Problems, which shows you queries that are running slow or using too many resources according configurable settings, or Resources, your standard breakdown of CPU, Memory Paging, Disk Queue and the rest, or SQL Text, literally the queries that have been running on your system.

Almost everything has a drill down that takes you to a lot more information, and there are pop-ups that show you bits of detail.

I’ve only been running it for a few days, but I can already see how it could be very useful. I’m very happy with the focus on waits and on queries. So often when you’re looking at these tools queries are either an after thought or ignored completely. You frequently have to spend time monitoring your queries on your own, while that very expensive monitoring software is tracking page life expectancy and buffer cache hit ratio, both useful measures, but you still need to know which query is causing the problems. Ignite will get to that information right away and still collect the other as well.

So far, I’m pretty pleased with the software. I’m going to switch this over to a production system in the next few days and see what I can see with Ignite running against a real system. More to come.

Apr 05 2010

nHibernate, First Look at TSQL

I’ve blogged in the past about the nHibernate project that has been going on at my company for the last eighteen months. Prior to today, I have only seen the database generated by nHibernate. But today I finally started seeing some TSQL code. My first impressions… oy!

There are two levels of concern coming out of the gate. First, it appears that some of the programming decisions, completely independent of nHibernate, are going to cause problems. Second, it appears we’re going to be hitting issues with nHibernate.

First, the programming decision. I’ve been using Confio to monitor this server for a few days now (look for upcoming posts on my eval of Confio). Only one day has captured any real activity from the nHibernate team (and yes, I’m basically spying on these guys because they are absolutely insistent that my team stay out of their way). The longest running query was two calls to this (names have been changed to protect my job, there are no innocents):

SELECT BookID
FROM Books WITH (updlock,rowlock)
WHERE BookID = 42

 What they’re doing is locking the row so that no other user can get at it while it’s being edited. Total lock time for the first day was 38 seconds for two calls. I’ve posted about this previously, so I won’t go into it again, but this is potentially a disaster.

On to nHibernate. The next query was pretty benign:

SELECT OrgID
,OfficeID
,StartDate
,EndDate
,OtherID
FROM SpecialList
WHERE OfficeID = 42

Unfortunately this table, as created out of nHibernate, has no primary key, no clustered index or any other type of index, so this is just a table scan. But it’s only on 109 rows… in dev, not production, oh, and most of the 109 rows have a null value for OfficeID, but it’s all good I suppose… until I notice that this query also had 38 seconds of wait time, but it was called 394,652 times… oy (and no, I’m not even one of the chosen, but that’s a wonderful expression for this situation). Looking at the data in cache, this query has been called, since it was created in cache on the 2nd, 598351 times with a total elapsed time on the server of 5429689ms. The average then is 9ms, but the max was 29296 or 29 seconds.

The next query up looked like this:

SELECT col1
,col2
,col3
FROM Table1
INNER JOIN Table2
ON Table1.ChildID = Table2.OtherID
WHERE tabl2.ParentID IN (@p1,@p1,@p2…@p99)

Yep. 99 parameters passed and used against an IN statement. I can’t post the execution plans on this stuff without talking it over with the boss, but suffice to say, it’s two table scans and a merge join to get the data back. Oh, and this was called 1,138 times with a minimum elapsed time of 976ms. Yeah, minimum time to call this proc is 1 second and it’s been called over a thousand times.

It went down hill from there. I’m looking at other queries, one that hits seven tables, but instead of using JOINs uses a bunch of sub-selects in the WHERE clause. It has a minimum run time of 9 seconds. Luckily it’s only been called 64 times.

This is a horror show.

A lot of benefit can be reaped if we’re able to go in and create some indexes on these tables, but that’s going to be an issue anyway because I’m seeing lots of implicit data conversions on the parameters being passed in, as I noted a couple of years ago. At this point in time, I would say, at best, I’m very underwhelmed by what nHibernate is producing. To be perfectly honest, I’m concerned for the health of the project.

Sep 21 2009

Identifying Frequently Running Queries

It’s not enough to look at the longest running query. What if you have two queries, one runs for 5 minutes and the other runs for 5 seconds. Which do you tune? The first one, of course. But, let’s add a little information to our thought experiment. The one that runs for 5 minutes is called 2AM, your local time, when there are almost no users on the system and the one that runs for 5 seconds is called 40 times a minute all day long. Now, which one do you tune? That’s right, suddenly that 5 second query takes on a HUGE significance based on how often it’s called.

But how do you know how often a query is called? The easiest way to gather that information is not necessarily the best in terms of a long term methodology, a server side trace that captures RPC and SQL Batch completion events. The beauty of the server side trace is how easy it is to set up and how much information you can gather, so quickly. You can immediately start getting every query run on the server along with how long they took to run, how much CPU each used and various other metrics. You can run this for short periods or long and then gather the data together into a table in a database and run aggregation queries against it to generate reports showing the most frequently called query. Simple, quick easy. But there are a couple of problems with it. First, while you can run it 24×7, you’re going to see quite a bit of data generated from the events, depending on the load on your system, of course. Second, the queries that come through will likely have all kinds of stuff associated with them, various paramter values and what not, that will make aggregation hard. You’ll have to put together a means of cleaning the data and maintaining it or purchase one of the many third party tools for doing this. You’ll have to store and manage large amounts of data as it comes into the system. It’s going to be a lot of work.

Another option, one that will require a degree of setup, is a little less accurate, it’s unlikely to capture every single execution of every single query, but it will get the majority, sys.dm_exec_query_stats. This handy little dynamic management view carries aggregate data about all the queries currently in the cache. It will show how much accumulated processing time a query has taken, the last time, the longest time, the shortest time and lots of other information, but the one interesting value we would be most interested in here, is the usage count. If you just run a SELECT statement against sys.dm_exec_query_stats, you’ll get interesting data back, but you won’t see your query or your execution plan. To make the information available within the DMV, you’ll need to combine it with other DMV’s. Here’s a sample query that pulls back information that could be useful for identifying badly performing queries on your system including how often they’re called.

SELECT dest.text
,deqp.query_plan
,deqs.execution_count
,deqs.creation_time
,deqs.query_hash
,deqs.query_plan_hash
,deqs.total_elapsed_time
,deqs.total_clr_time
,deqs.total_physical_reads
,deqs.total_worker_time
FROM sys.dm_exec_query_stats deqs
CROSS APPLY ( SELECT *
           FROM sys.dm_exec_sql_text(deqs.sql_handle)) dest
CROSS APPLY ( SELECT *
           FROM sys.dm_exec_query_plan(deqs.plan_handle)) deqp

There’s a lot you could do with this query, but the real power is going to come from gathering this data on a regular basis. It’s going to depend on how volatile the cache is on your system. If most of your queries sit in the cache for weeks (and that’s possible), then you could run a query once a week to gather the information. If your cache is getting flushed clean once an hour, you’d need to run this query more often. Regardless of how often you run the query, understand that, because you’re dependent on the cache, it is possible to miss data. Once you get the query running on your system, you can use the information to determine which of the queries needs to be tuned, not just on how long it runs, but also how often it runs, or even using other combinations of the information returned.

One other point, you may see ad hoc queries on your system. Here’s a simple example:

SELECT TOP (5) sod.*
FROM Sales.SalesOrderDetail sod
WHERE sod.ProductID = 772
ORDER BY sod.SalesOrderID ASC ;

If the value passed to sod.ProductID is changed to 773 or any other value, you will get a different entry in cache for this procedure. That’s where you can put the query_hash or the query_plan_hash values to work. These values show a hash that SQL Server creates for each query that it stores in cache. You could group by these values to aggregate ad hoc queries that would, if they were stored procedures, be identical executions and arrive at an aggregated count.

So, if you need precise information, you can use trace events, but, pay for that with a lot of data that you have to manage.

EDIT: Changed deqs.total_elapsed_date to deqs.total_elapsed_time as per comments below.

Sep 14 2009

Editorial on SQL Server Central

My first one over there. It’s discussing whether or not you should do two things, build your own monitoring tool, come out in particular favor of one tool or suite of tools from a single vendor. Please read it and watch the video. And, even more importantly, leave a comment in the discussion.