Refining the Missing Index Data

In my previous post I showed how you could determine if a query had missing index information, but it was difficult to get meaningful data out because I didn’t know XQuery well enough. I spent some more time refining the query and here are the results. This new query will show some basic performance information gathered from the aggregate data in sys.dm_exec_query_stats. It combines this with the full data pulled from the Missing Indexes element in the XML of the execution plan. I’ve got it listing all the recommended columns and grouping. So this means that the performance data is repeated in order to allow for the full listing of groups & columns. It seems to work well.

A couple of interesting points. My purpose is to provide a short-cut method of identifying possible indexing needs for systems that run lots of dynamic or ad hoc queries. In my own environment I’m mainly talking about either Microsoft CRM or nHibernate, but the rules would apply to almost any other client-side query engine. I wanted to list the database name as part fo the query. DBID is included with the sys.dm_exec_query_plan, but unfortunately, when a query is prepared or ad hoc, this value is null. Luckily enough the data is available in the execution plan itself. That’s going to be something to watch for in the future when using these execution plans.

Here’s the query:
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' AS sp)
SELECT p.query_plan.value(N'(sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/sp:MissingIndex/@Database)[1]', 'NVARCHAR(256)') AS DatabaseName
,s.sql_handle
,s.total_elapsed_time
,s.last_execution_time
,s.execution_count
,s.total_logical_writes
,s.total_logical_reads
,s.min_elapsed_time
,s.max_elapsed_time
,p.query_plan
,p.query_plan.value(N'(sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/sp:MissingIndex/@Table)[1]', 'NVARCHAR(256)') AS TableName
,p.query_plan.value(N'(/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/sp:MissingIndex/@Schema)[1]', 'NVARCHAR(256)') AS SchemaName
,p.query_plan.value(N'(/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/@Impact)[1]', 'DECIMAL(6,4)') AS ProjectedImpact
,ColumnGroup.value('./@Usage', 'NVARCHAR(256)') AS ColumnGroupUsage
,ColumnGroupColumn.value('./@Name', 'NVARCHAR(256)') AS ColumnName
FROM sys.dm_exec_query_stats s
CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) AS p
CROSS APPLY p.query_plan.nodes('/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup/sp:MissingIndex/sp:ColumnGroup') AS t1 (ColumnGroup)
CROSS APPLY t1.ColumnGroup.nodes('./sp:Column') AS t2 (ColumnGroupColumn)
WHERE p.query_plan.exist(N'/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple/sp:QueryPlan//sp:MissingIndexes') = 1
ORDER BY s.total_elapsed_time DESC

24 thoughts on “Refining the Missing Index Data

  • Excelent query, I can’t stress how important this query is to have in ones toolbox. After correcting ` and ’ to ‘ and adding a top 20 in the select clause I immediatly got valuable results.

    And what did I know, the biggest offender on one of our production databases is an external party that has 1 database on our server for an application that servers one of our own customers.

    Many thanks for this wonderful query…..I rate it 10/10 🙂

  • scarydba

    Wow! Thank you. I’m glad it was useful. I know we’re starting to put it to to work within our team. I had finished it witout the database name in the query, but when one of my co-workers started testing it we found one database that had so many badly performing queries with missing index reports (we knew it had problems, but…) that I had to get the database name in for filtering purposes.

    I’m sure there are better ways to perform the XQuery parts of the script because I’m barely getting started with my XQuery syntax.

  • I am new to XQuery, and expected XPath when working with XML (which I do both know quite well). It is something else I have to set my teeth in as the I can see it becoming more important in the future.

    A thing people can find intimidating is the output in SQL Server 2005. The queryplan xml can be clicked on and then it just open a raw XML document in a seperate tab.

    But in SQL Server 2008 it will take you directly to a graphical query plan and you can right click in the header to get the original query text (complete with formatting) and a create script for the suggested index.

    Which is all very neat!

    We now scheduled a cleanup day for next week to fix bad running queries on our servers and educate people on what to look for. Personaly I am trying make the output easyer to read by glueing the suggested index fields on one line so that top 20 means 20 queries instead of 20 fields in total.

    Once I got mine perfect (can take a while), I will post it here!

    Again, many thanks!

  • scarydba

    Excellent. I look forward to it. After you get it working really well, go on over to SQL Server Central and post it in their script library too. A lot more people will have access to it and see it then. But still post it here to, or at least post a pointer. I’d be very interested.

  • andrew

    Any chance of mailing me this, I have cut and pasted it into management studio and got rid of the funky quotation marks but still getting errors, don’t know what I am missing?

    Andrew

  • scarydba

    Sorry it took so long to get back. Glad you worked it out. Sorry about the funky quotes. I’m still wrestling with getting code to publish appropriately to the blog.

  • andrew

    ScaryDba,

    I have been using this code and been able to tame some pretty bad performance from some dynamic SQL, what I would like to do is parse out the p.query_plan field and get the missing index detail, that way I could run this on a job and save the missing index details over time and do some queries to figure which would be of best benefit over time, I am trying to figure this out, but as always if the wheel has already been invented I would rather not reinvent:-)

    Regards

    Andrew

  • scarydba

    Hey Andrew,

    Yeah, that’s the direction we’re headed in. No wheels on the bus yet though. We’re just thinking about it at the moment… how often to query it, how long to keep it, how to aggregate it… I’m leaning pretty heavily towards adding in the query hash as a mechanism for aggregation & identification. More experiments over the next couple of weeks. I’ll post again when I get something. However, if you beat me to it, please share.

  • greg

    After replacing the quotes, I’m still unable to get this to work. My assumption is that I’m on a SQL Server 2005 database, but running in 2000 compatibility mode. Is my assumption correct? (the error says “the value function requires 2 argument(s)”)

    • scarydba

      I am sorry about the quotes. I can’t install a code formatting widget unless I go on the pay site.

      I’m not sure what would happen if you were running 2000 compatibility mode, but the query is working on a 2005 server, I’ve used it a bunch.

  • LimeGargoyle

    Greg:
    On items 1 and 11-15, make certain that you got the funky quotes areound the comma separating the elements of the value function. I missed one set and got the same error. Try replacing them even if they LOOK right.

  • ksper

    Receive and error when i run this against one of my server.
    XML datatype instance has too many levels of nested nodes. Maximum allowed depth is 128 levels.
    Any ideas ?

  • scarydba

    Unless you’ve got missing index queries that are suggesting more than 128 columns… which is scary, but interesting. Can you see the missing index information in the XML just raw, without my query running?

  • andrew

    This is what I ended up doing, to get the index definition without opening up MS, I collect the details into a table on the DB being looked at for missing indexes, run every minute and then run the below script to loop through the table and create the index columns, not very pretty but it appears to work, I then do a bunch of string manipulation to get the end definition.

    declare @MissingIndexDefinition NVARCHAR(2000)
    IF OBJECT_ID (N’tempdb.dbo.MissingIndexDetailsProcess’, N’U’) IS NOT NULL
    DROP TABLE tempdb.dbo.MissingIndexDetailsProcess
    Create Table tempdb.dbo.MissingIndexDetailsProcess (
    [ID] [int] NOT NULL ,
    sql_handle_id varbinary(max),
    execution_count int,
    Avg_CPU_Time int,
    Query_Text nvarchar(max),
    query_plan xml NOT NULL,
    TableName nvarchar(250),
    ColumnGroupUsage nvarchar(250),
    ColumnName nvarchar(250),
    ProjectedImpact float,
    [Processed] [bit] NOT NULL CONSTRAINT [MissingIndexDetailsProcess1] DEFAULT (0),
    InsertDate datetime,
    Index_Definition nvarchar(max))

    IF OBJECT_ID (N’tempdb.dbo.MissingIndexDetails’, N’U’) IS NOT NULL
    DROP TABLE tempdb.dbo.MissingIndexDetails
    Create Table tempdb.dbo.MissingIndexDetails (
    [ID] [int] IDENTITY (1, 1) NOT NULL ,
    sql_handle_id varbinary(max),
    execution_count int,
    Avg_CPU_Time int,
    Query_Text nvarchar(max),
    query_plan xml NOT NULL,
    TableName nvarchar(250),
    ColumnGroupUsage nvarchar(250),
    ColumnName nvarchar(250),
    ProjectedImpact float,
    [Processed] [bit] NOT NULL CONSTRAINT [tempdb.dbo.MissingIndexDetails] DEFAULT (0),
    InsertDate datetime,
    Index_Definition nvarchar(max))

    insert into tempdb.dbo.MissingIndexDetails
    (sql_handle_id, execution_count, Avg_CPU_Time, Query_Text, query_plan,
    TableName, ColumnGroupUsage, ColumnName, ProjectedImpact, InsertDate)
    select sql_handle, execution_count, Avg_CPU_Time, Query_Text,
    query_plan, TableName, ColumnGroupUsage, ColumnName, ProjectedImpact, getdate() from DBA_MissingIdexDetails
    update tempdb.dbo.MissingIndexDetails set Index_Definition = ”

    while (select Count(*)
    from tempdb.dbo.MissingIndexDetails where Processed = 0
    ) > 0

    begin

    insert into tempdb.dbo.MissingIndexDetailsProcess
    select * from tempdb.dbo.MissingIndexDetails
    where sql_handle_id in

    (select sql_handle_id from tempdb.dbo.MissingIndexDetails
    where id in
    (select min(id) from
    (select min(id) as id, sql_handle_id from tempdb.dbo.MissingIndexDetails
    where Processed = 0
    group by sql_handle_id) T1))

    while (select Count(*)
    from tempdb.dbo.MissingIndexDetailsProcess where Processed = 0
    ) > 0

    begin
    select @MissingIndexDefinition =
    (select case when ColumnGroupUsage = ‘EQUALITY’ then ‘ #’+ColumnName
    when ColumnGroupUsage = ‘INEQUALITY’ then ‘ #’+ColumnName
    when ColumnGroupUsage = ‘INCLUDE’ then ‘ @’+ColumnName
    end
    from tempdb.dbo.MissingIndexDetailsProcess
    where id in
    (select top 1 id from tempdb.dbo.MissingIndexDetailsProcess where Processed = 0
    order by id))

    update tempdb.dbo.MissingIndexDetailsProcess
    set Index_Definition = Index_Definition+@MissingIndexDefinition
    where Processed = 0

    update tempdb.dbo.MissingIndexDetailsProcess set Processed = 1
    where id in (select top 1 id
    from tempdb.dbo.MissingIndexDetailsProcess where Processed = 0
    order by id)

    end

    set @MissingIndexDefinition = ”

    update tempdb.dbo.MissingIndexDetails set Processed = 1
    where sql_handle_id in

    (select sql_handle_id from tempdb.dbo.MissingIndexDetails
    where id in
    (select min(id) from
    (select min(id) as id, sql_handle_id from tempdb.dbo.MissingIndexDetails
    where Processed = 0
    group by sql_handle_id) T1))

    end

    truncate table tempdb.dbo.MissingIndexDetails

    set identity_insert tempdb.dbo.MissingIndexDetails on

    insert into tempdb.dbo.MissingIndexDetails
    (ID, sql_handle_id, execution_count, Avg_CPU_Time, Query_Text, query_plan, TableName, ColumnGroupUsage,
    ColumnName, ProjectedImpact, Processed, InsertDate, Index_Definition)
    select * from tempdb.dbo.MissingIndexDetailsProcess
    set identity_insert tempdb.dbo.MissingIndexDetails off

  • These are the functions I ended up with:

    Function 1: Optimized for quick, but potentially inaccurate database filtering, something which is otherwise very slow when applying such filtering to a large production environment.

    create function dbo.fn_MissingIndexes( @database sysname = null ) returns table
    as
    return
    (
    — Missing index detection

    with
    xmlnamespaces( ‘http://schemas.microsoft.com/sqlserver/2004/07/showplan’ as sp )
    select
    p.query_plan
    , ig.missingIndexGroup.value( ‘@Impact’, ‘decimal(6,4)’ ) as [impact]
    , i.missingIndex.value ( ‘@Database’, ‘nvarchar(256)’ ) as [database]
    , i.missingIndex.value ( ‘@Schema’, ‘nvarchar(256)’ ) as [schema]
    , i.missingIndex.value ( ‘@Table’, ‘nvarchar(256)’ ) as [table]
    , cast( s.total_elapsed_time as money ) / 1000000 as [total_elapsed_time]
    , cast( s.min_elapsed_time as money ) / 1000000 as [min_elapsed_time]
    , cast( s.max_elapsed_time as money ) / 1000000 as [max_elapsed_time]
    , s.execution_count
    , s.last_execution_time
    , s.total_logical_writes
    , s.total_logical_reads
    , stuff
    (
    (
    select
    cast( ‘, ‘ as varchar(max) )
    + case when cg.indexColumnGroup.value ( ‘@Usage’, ‘nvarchar(256)’ ) = ‘INCLUDE’ then ‘include: ‘ else ” end
    + c.indexColumn.value( ‘@Name’, ‘nvarchar(256)’ )
    from
    i.missingIndex.nodes( ‘sp:ColumnGroup’ )
    as cg( indexColumnGroup )

    cross apply cg.indexColumnGroup.nodes( ‘sp:Column’ )
    as c( indexColumn )
    for
    xml path( ” )
    )
    , 1
    , 2
    , ” ) as ‘index_columns’
    , st.StmtSimple.value ( ‘@StatementText’, ‘nvarchar(max)’ ) as [StmtSimple]
    , s.sql_handle
    from
    sys.dm_exec_query_stats as s
    cross apply sys.dm_exec_sql_text( s.sql_handle ) as q
    cross apply sys.dm_exec_query_plan( s.plan_handle ) as p

    cross apply p.query_plan.nodes( ‘/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple’ )
    as st( StmtSimple )

    cross apply st.StmtSimple.nodes( ‘sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup’ )
    as ig( missingIndexGroup )

    cross apply ig.missingIndexGroup.nodes( ‘sp:MissingIndex’ )
    as i( missingIndex )
    where
    q.text not like ‘%”http://schemas.microsoft.com/sqlserver/2004/07/showplan”%’ and

    — quick database detection (is context of execution, not neccecarily the database of the object that needs an index)

    (
    @database is null or
    exists( select 1 from sys.dm_exec_plan_attributes( s.plan_handle ) as a where a.attribute = ‘dbid’ and a.value = DB_ID( @database ) )
    )
    )
    ;
    go

    select top 5 * from master.dbo.fn_MissingIndexes( default )
    order by
    — last_execution_time desc
    — execution_count desc, total_elapsed_time desc
    total_elapsed_time desc
    — total_logical_writes desc
    — total_logical_reads desc
    — min_elapsed_time desc

    option( recompile )
    ;

    select top 5 * from master.dbo.fn_MissingIndexes( ‘mydatabase’ )
    order by
    — last_execution_time desc
    — execution_count desc, total_elapsed_time desc
    total_elapsed_time desc
    — total_logical_writes desc
    — total_logical_reads desc
    — min_elapsed_time desc

    option( recompile )
    ;

  • Function 2: Slow but accurate database filtering.

    alter function dbo.fn_MissingIndexesExact( @database sysname = null ) returns table
    as
    return
    (
    — Missing index detection

    with
    xmlnamespaces( ‘http://schemas.microsoft.com/sqlserver/2004/07/showplan’ as sp )
    select
    p.query_plan
    , ig.missingIndexGroup.value( ‘@Impact’, ‘decimal(6,4)’ ) as [impact]
    , i.missingIndex.value ( ‘@Database’, ‘nvarchar(256)’ ) as [database]
    , i.missingIndex.value ( ‘@Schema’, ‘nvarchar(256)’ ) as [schema]
    , i.missingIndex.value ( ‘@Table’, ‘nvarchar(256)’ ) as [table]
    , cast( s.total_elapsed_time as money ) / 1000000 as [total_elapsed_time]
    , cast( s.min_elapsed_time as money ) / 1000000 as [min_elapsed_time]
    , cast( s.max_elapsed_time as money ) / 1000000 as [max_elapsed_time]
    , s.execution_count
    , s.last_execution_time
    , s.total_logical_writes
    , s.total_logical_reads
    , stuff
    (
    (
    select
    cast( ‘, ‘ as varchar(max) )
    + case when cg.indexColumnGroup.value ( ‘@Usage’, ‘nvarchar(256)’ ) = ‘INCLUDE’ then ‘include: ‘ else ” end
    + c.indexColumn.value( ‘@Name’, ‘nvarchar(256)’ )
    from
    i.missingIndex.nodes( ‘sp:ColumnGroup’ )
    as cg( indexColumnGroup )

    cross apply cg.indexColumnGroup.nodes( ‘sp:Column’ )
    as c( indexColumn )
    for
    xml path( ” )
    )
    , 1
    , 2
    , ” ) as ‘index_columns’
    , st.StmtSimple.value ( ‘@StatementText’, ‘nvarchar(max)’ ) as [StmtSimple]
    , s.sql_handle
    from
    sys.dm_exec_query_stats as s
    cross apply sys.dm_exec_sql_text( s.sql_handle ) as q
    cross apply sys.dm_exec_query_plan( s.plan_handle ) as p

    cross apply p.query_plan.nodes( ‘/sp:ShowPlanXML/sp:BatchSequence/sp:Batch/sp:Statements/sp:StmtSimple’ )
    as st( StmtSimple )

    cross apply st.StmtSimple.nodes( ‘sp:QueryPlan/sp:MissingIndexes/sp:MissingIndexGroup’ )
    as ig( missingIndexGroup )

    cross apply ig.missingIndexGroup.nodes( ‘sp:MissingIndex’ )
    as i( missingIndex )
    where
    q.text not like ‘%”http://schemas.microsoft.com/sqlserver/2004/07/showplan”%’ and

    — Affected database based on actual object in need of an index

    i.missingIndex.value ( ‘@Database’, ‘nvarchar(256)’ ) = ‘[‘ + @database + ‘]’
    )
    ;
    go

    select top 5 * from master.dbo.fn_MissingIndexesExact( default )
    order by
    — last_execution_time desc
    — execution_count desc, total_elapsed_time desc
    total_elapsed_time desc
    — total_logical_writes desc
    — total_logical_reads desc
    — min_elapsed_time desc
    ;

    select top 5 * from master.dbo.fn_MissingIndexesExact( ‘MyDatabase’ )
    order by
    — last_execution_time desc
    — execution_count desc, total_elapsed_time desc
    total_elapsed_time desc
    — total_logical_writes desc
    — total_logical_reads desc
    — min_elapsed_time desc
    ;

  • You can see from the way I invokes the functions in my examples that I created them in my master database.

    Another thing you can do to speed searching up is to run ** once **:

    DBCC FREEPROCCACHE;

    This clears your complete queryplan cache and all attached accumulated statistics. Then let your system work as normal for an hour to a day and accumulate new statistics. Then run the queries to see what could have been improved.

    When getting results, do not blindly add new indexes as the plans might suggest. Study the situation carefully as too many indexes have downsides too. Just use the suggestions as input to replan your indexes for an entire table as a whole to serve all queries running against it.

  • scarydba

    Interesting stuff. I’ll need to play with it a bit. Thanks for sharing.

    One point, I would not recommend running DBCC FREEPROCCACHE on a production system that’s in use, especially on a heavily used system. Getting all the procs back into cache can be an expensive operation for some systems..

  • I agree with your recomodation of not using it on a 24/7 heavy loaded system. It would get hit with a brick if you clear the cache completely in such a situtation. The servers I work with are more related to office hours and become quiet before midnight and stay that way till 6am (bar the backup period). These servers have time to get up to speed as more users arrive and start processed performing DB access. It is not like it goes from nothing to peak load :).

  • scarydba

    Hey Doug,

    No I suspect they wouldn’t correlate. The total_elapsed_time is going to be the actual run times for the queries and the ProjectedImpact is going to the optimizer’s best guess at how the query will be affected by including the index that it thinks it could use. I’d be surprised if there was a correlation between those two values. It’s because the estimated values from the optimizer are just not to be really, entirely, trusted. That doesn’t make it less than a useful measure, it would be. You’d just need to be sure to not rely on it completely as TRUTH.

    I used total_elapsed_time, but avg time or number of executions or any of the other measures would be equally valid.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.