Characters

No, I’m not talking about a Dickens novel. I’m talking about the number of characters in a string. I had a painful time recently because of the word “characters.” 

If you take a look at the dynamic management view sys.dm_exec_sql_text you can get the queries that have been run on your system that are still in the cache. It’s a great utility. Better still, you can get specific statements from the code that are actively running through sys.dm_exec_requests or ones that have run through sys.dm_exec_query_stats. To do this is very simple. Each of these DMV’s has a pair of columns, statement_start_offset and statement_end_offset. These columns, and I’m quoting directly from books online measure the “number of character” offset from the beginning of the SQL string and from the end of the SQL string. Using these values you can retrieve an individual statement out of a stored procedure that has multiple statements.

But… Here’s where things get tricky. Try this on your machine:

SELECT SUBSTRING(dest.text, (der.statement_start_offset ) + 1,
(der.statement_end_offset - der.statement_start_offset) + 1)
,LEN(dest.text) AS CharLength,
der.statement_start_offset,
der.statement_end_offset
FROM sys.dm_exec_query_stats AS der
CROSS APPLY sys.dm_exec_sql_text(der.sql_handle) AS dest
WHERE der.statement_end_offset > -1

You might get an error or you might get a bunch of really odd looking statements in the first column, starting part way into TSQL and cutting off after they’re done or before they’re over. It’ll look odd. But what’s the deal? The SUBSTRING function should work. Logically it’s configured correctly. Here’s the problem.

The [text] column in sys.dm_exec_sql_text is of the datatype NVARCHAR(MAX). Unicode. If you look at the length of the text, it’ll tell you exactly how many characters you see in the string that called to your server. But, the statement_start_offset and statement_end_offset are measuring something different. They’re not measuring characters, they’re measuring unicode characters. Try this query instead:

SELECT SUBSTRING(dest.text, (der.statement_start_offset / 2) + 1,
(der.statement_end_offset - der.statement_start_offset) / 2+ 1),
LEN(dest.text) AS CharLength,
DATALENGTH(dest.text) AS DLength,
DATALENGTH(dest.text) / 2 AS HalfDLength,
der.statement_start_offset,
der.statement_end_offset
FROM sys.dm_exec_query_stats AS der
CROSS APPLY sys.dm_exec_sql_text(der.sql_handle) AS dest
WHERE der.statement_end_offset > -1

You can see that the character length is, whatever it’s supposed to be, but the DATALENGTH is twice that much. Unicode, as we all know, includes a byte to identify the character set. That’s included in the character count in statement_start_offset and statement_end_offset.  You need to take that into account when dealing with these “characters.”

One thought on “Characters

  • doneel

    Substring works okay for nvarchar fields.
    It is statement_start_offset and statement_end_offset that are “wrong” or confusing.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.