Extended Events, the system_health Session, and Waits

I advocate for, use, document, teach, and just downright love, Extended Events. They are so much better than the old Trace Events (aka, Profiler) that it’s sometimes difficult to keep from just gushing. Let’s talk about a common situation that you’re going to run into on your servers all the time and how you can put Extended Events to work to help you, without actually doing any work at all.

What’s that? Be lazy and get rewards? Yes.

The Extended Events system_health Session

On your servers, any of them that are SQL Server 2008 or newer, right now, unless you’ve performed actions to prevent this, you’re running the Extended Events system_health session. It’s just happening, currently, on all your servers. Nothing you need to do about it at all. I’ll be a lot of you never even knew it was there.

If you follow the link you can see all the various types of information being gathered by the Extended Event system_health session. I won’t detail all of it here. Let me just provide a little context around how the session works. First and foremost, similar to the error log, this session consists of four files, each 5mb in size, rolling over as they get filled. For systems with a very high degree of activity, that means the information here may only be hours old. However, for most of us, this provides days, if not weeks worth of information about the behavior of your system.

You’ll need to know where the files are located so that we can query them or open them up in the Live Data window. Here’s a simple query that will give you the path on any system:

SELECT dosdlc.path
FROM sys.dm_os_server_diagnostics_log_configurations AS dosdlc;

Now there are a ton of reasons why you should be taking advantage of the system_health session (deadlocks for example), but I’m going to focus on one, blocks.

Waits In The system_health Session

To see this information in action, we can set up a really simple query. Run this from two different query windows and let it sit for about 45 seconds or so (it only needs 30, but for our purposes, add a little padding):

BEGIN TRAN
UPDATE dbo.ErrorLog
SET ErrorTime = GETDATE()
WHERE ErrorLogID = 1;

After a little while, roll back both scripts. Then, we can run this script to take a look at the system_health information specifically on the waits caused by these two queries blocking one another:

DECLARE @path NVARCHAR(260);
--to retrieve the local path of system_health files 
SELECT @path = dosdlc.path
FROM sys.dm_os_server_diagnostics_log_configurations AS dosdlc;

SELECT @path = @path + N'system_health_*';

SELECT CAST(fx.event_data AS XML) AS Event_Data,
       fx.object_name
FROM sys.fn_xe_file_target_read_file(@path,
                                     NULL,
                                     NULL,
                                     NULL) AS fx
WHERE fx.object_name = 'wait_info';

The results on my system look like this:

<event name="wait_info" package="sqlos" timestamp="2018-06-01T12:04:18.915Z">
  <data name="wait_type">
    <value>5</value>
    <text>LCK_M_X</text>
  </data>
  <data name="opcode">
    <value>1</value>
    <text>End</text>
  </data>
  <data name="duration">
    <value>92115</value>
  </data>
  <data name="signal_duration">
    <value>0</value>
  </data>
  <data name="wait_resource">
    <value>0x0000002c2d3fb690</value>
  </data>
  <action name="sql_text" package="sqlserver">
    <value>BEGIN TRAN
UPDATE dbo.ErrorLog
SET ErrorTime = GETDATE()
WHERE ErrorLogID = 1;
</value>
  </action>
  <action name="session_id" package="sqlserver">
    <value>71</value>
  </action>
  <action name="callstack" package="package0">
    <value>sqllang.dll+0x000000000053FF82
sqllang.dll+0x0000000000541241
sqllang.dll+0x0000000000540669
sqldk.dll+0x00000000001055E5
sqldk.dll+0x00000000000025E1
sqldk.dll+0x000000000000148F
sqldk.dll+0x000000000000354B
sqlmin.dll+0x0000000000067449
sqlmin.dll+0x000000000006720B
sqlmin.dll+0x000000000000CF48
sqlmin.dll+0x000000000001391E
sqlmin.dll+0x0000000000072EA6
sqlmin.dll+0x000000000002E7AF
sqlmin.dll+0x000000000002E595
sqlmin.dll+0x0000000000031786
sqlTsEs.dll+0x0000000000001F17
sqlmin.dll+0x0000000000059010
sqlmin.dll+0x000000000003F6BF
sqllang.dll+0x000000000001817B
sqllang.dll+0x000000000002B86F
sqllang.dll+0x000000000002B61F
sqllang.dll+0x0000000000B8BFDB
sqllang.dll+0x0000000000B7D7CB
sqllang.dll+0x0000000000013A74</value>
  </action>
</event>

You have the query, the wait_resource, the duration, in short, everything you need to start identifying why you had excessive blocking on one of your servers and all you had to do was reach out and grab it.

I’ll leave parsing the XML to others (check out this book, just starting to read it).

Conclusion

Really, there was a query that ran unusually long yesterday? I wonder if it was blocked and waiting on resources? Let me take a quick look at the Extended Events system_health session. It really is that easy. This is a free, easily accessed resource that is available to you right now. Please, use it.


Want to see a WHOLE bunch of other tricks and methods with Extended Events as well as a slew of other tools for dealing with SQL Server performance? I’m putting on an all day seminar in various locations. Please follow the links below to take advantage of this great learning and sharing opportunity:

For SQLSaturday Indianapolis (and my first time speaking in Indiana) on August 10, 2018. Please go here to sign up.

I’ll be at SQLSaturday Sioux Falls (and my first time speaking in South Dakota) on August 18, 2018. Please sign up here.

For SQLSaturday Oslo on August 31, 2018. Click here right now to register.

5 thoughts on “Extended Events, the system_health Session, and Waits

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.