
Joe Fleming wants to know how I solve weird problems. I’m not sure I’m actually qualified to answer this question since I, myself, am a weird problem, but I’ll give it a shot.
One aspect of my job is to provide support to our clients. No, I’m not on call (thank the gods), but I’ll get roped in for, well, the weirder ones. No, no, not the weirder clients (although…), the weirder problems. Recently we had one, not exactly weird, but it did take a few odd turns along the way.
The Problem
A client asked us to tell them when a query ran long. Simple. We have a long running query alert, all built in to Redgate Monitor, so, done. No, see, we like getting alerted when queries run long, but not really long, plus we’re more concerned with just one database.
OK, well, that takes the long running query out of the picture (although, we could change the threshold for this one db). It’s fine though. We can capture all queries that run longer than a particular value for a database using an Extended Event… Not just a database. A particular query.
OK, again, probably not a problem, Extended Events can be filtered… It’s from Fabric.
Ohhhh, OK. Oh no. I mean, OK, sure. Uhm…
“Right Mr. Smarty pants. Say Extended Events now.” You know I can hear you all right?
So back to the client. Is the query generated in Fabric? If so, is the SELECT statement relatively stable? I’m thinking, if I can at least get a stable string to search on, I can filter an Extended Events session using a wild card on the text, even if the WHERE clause, heck, the JOINs, are constantly changing. If our SELECT is stable, this could work. It’s a stored procedure.
AWESOME!
I mean, OK, no worries. We’ll use rpc_completed, filter on the database and the object name and the duration. If it normally runs in 3 minutes, we’ll look for anything greater than that value. Done.
And we are. The plan is good for the client.
Ah, but I’m not done.
The Solution
We already listed the solution, right? rpc_completed with a couple of filters? Yeah, but, think about what we’re capturing and why we’re capturing it. Do we need to output our Extended Events to a file? Does that file need to store duration, reads, writes, the SQL? Because remember what the client wants, and it makes sense, did we see any of these go over. Not, when they go over, give me all the metrics. Just, let me know if it did go over, I’ll handle it from there. Let’s be minimal on this. So… just a count then.
AH-HA!
Time for one of my favorite little bits of Extended Events. The histogram target.
The histogram target is silly simple, but also really cool. Here’s how I defined the target for the client:

When you create a histogram target, first, you get to pick which event you’re looking at. Our session only has a single event, so that’s easy in this instance. Then, you get to decide whether to use Actions to supply your counts, or Fields from the event itself. Actions, I love the name, can add overhead, so, let’s go with a Field. Which Field? Since we’re already filtering by the object_name, we really could use any of the fields, but, I went with object_name again, just because it makes it easy enough to understand what we’re counting. I could have gone with object_id and arrived at the same place. So anyway, the histogram will now simply count all the occurrences of the rpc_completed event from the session and group those counts by object_name. All that’s left for us to do is write a query to retrieve the data:
SELECT xed.slot_data.value('(@count)[1]', 'int') AS longrunningcount
FROM
(
SELECT CAST(xet.target_data AS XML) AS target_data
FROM sys.dm_xe_session_targets AS xet
JOIN sys.dm_xe_sessions AS xe
ON xe.address = xet.event_session_address
WHERE xe.name = N'LongRunningQuery'
AND xet.target_name = N'histogram'
) AS t
CROSS APPLY t.target_data.nodes('//HistogramTarget/Slot') AS xed(slot_data)
WHERE xed.slot_data.value('(value)[1]', 'nvarchar(256)') = N'TheQueryInQuestion';
You’ll note, I’m just going for a count. I do filter on the object_name value in the histogram (and I could have done that for the object_id as well).
And now we’re done.
Almost. You would want to reset the histogram, so you’d have to start and stop the session. Only way to reset the count.
Conclusion
OK, not that weird of a problem, but I liked the idea of running through my thought processes as we worked through the problem. Yeah, you could say I was overly focused on Extended Events as a solution, to which I plead guilty. However, as soon as you say you want to measure query performance we have three choices. First, DMVs, all the way. But they are aggregated totals and knowing that a given query ran long at a given time is hard there. OK, choice #2 is Query Store. We do again have to deal with aggregation, but it’s aggregation in chunks, 60 minutes by default. This one might work, just look at the preceding 60 minutes, see if the max length is greater than 3 minutes. Mission accomplished. And really, that would probably work as well as my solution. However, mine is targeted in the extreme. I can ensure that the rpc_completed event only fires as needed and I can get a precise count.
How’s this working for the client? It’s not. We hit some other requirements and went a different direction. We’re still testing. If there are issues, I’m going to bring up Query Store, see if that’ll work.
It’s generally less painful to filter on the .exist method than the .value one, e.g. WHERE xed.slot_data.exist(‘(value)[1][.= “TheQueryInQuestion”]’) = 1;
Oooh! Cool. Thanks.