Category: Azure

Jan 18 2017

Scheduling Statistics Maintenance in Azure SQL Data Warehouse

The power and capabilities of Azure SQL Data Warehouse are pretty amazing. However, it’s still basically SQL Server under the covers. SQL Server still needs maintenance and one type of maintenance is keeping statistics up to date. Statistics maintenance is doubly important within Azure SQL Data Warehouse because statistics are not created automatically, nor are they maintained automatically. It’s up to you to ensure your statistics are up to date.

Statistics in Azure SQL Data Warehouse

It’s worth re-emphasizing the point that there is no automatic creation of statistics in Azure SQL Data Warehouse. This means you need to plan to add statistics to your tables. Microsoft recommends an easy method is to add them to every column. While that does make things easy, it’s not like statistics are free. If you add them to every column, you’ll also have to maintain them, on every column. That could be costly. I’d rather see a targeted approach, but that will require additional work and a good understanding of how the Data Warehouse is being used. That’s not always possible, hence Microsoft’s suggestion. Your methods of data loading to your Warehouse are also going to drive your statistics maintenance routines. It’s a complex dance that goes beyond the scope of this blog post. To read more about what Microsoft has to say about statistics in Azure SQL Data Warehouse, read this article.

Automating Maintenance

Because we’ve moved into a Platform as a Service (PaaS) offering with Azure SQL Data Warehouse, we don’t have the types of automation mechanisms we’re accustomed to. By that I mean, there is no SQL Agent. Instead, we have Azure Automation:

AzureAutomation

I have an Azure SQL Data Warehouse I’m using for teaching this topic while on SQL Cruise. Let’s go ahead and add an automation account:

CruiseAutomation

NOTE: The most important habit you can start with in Azure is putting everything into discrete, planned, Resource Groups. These make management so much easier.

Once the account is set, the first thing you need is to create a Runbook. There is a collection of them for your use within Azure. None of them are immediately applicable for what I need. I’m just writing a really simple Powershell script to do what I want:

$SqlUsername = 'Grant'
$SqlPwd = 'xxx'

# Define the connection to the SQL Database
$Conn = New-Object System.Data.SqlClient.SqlConnection("Server=tcp:cruise2017.database.windows.net;Database=Cruisers;User ID=$SqlUsername;Password=$SqlPwd;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;")
        
# Open the SQL connection
$Conn.Open()

# Define the SQL command to run. In this case we are getting the number of rows in the table
$Cmd=new-object system.Data.SqlClient.SqlCommand("UPDATE STATISTICS dbo.DimReseller", $Conn)

# Execute the SQL command
$Cmd.ExecuteNonQuery()

# Close the SQL connection
$Conn.Close()

Nothing to it. Yes, for a production system you’ll want to be able to loop through all your tables, pick and choose which statistics get updated, etc. I’ll leave that to you to figure out for your system. It’s straight forward stuff.

NOTE: Thanks to Denny Cherry and Mike Fal for some pointers on troubleshooting setting this up. There’s a Test Pane that you need to make liberal use of along with Write-Output commands in order to identify where things are going south.

With the Runbook complete, save it and Publish it:

EditRunbook

Scheduling Automation

Once I’ve published my Runbook, I can simply click on the Schedule button in order to decide when I want it to run:

schedule

Obviously you can pretty easily set up the schedule in all the standard methods available to you in SQL Agent. Once this is done, you’re all set. You’ve just successfully created a schedule for updating statistics in Azure SQL Data Warehouse.

Conclusion

While a long discussion can be had around when and where you need to create statistics inside Azure SQL Data Warehouse, there is no doubt that you will need to update those statistics. It can be just simply a part of your load process. However, if you have a trickle load, or nightly updates, you’ll want to explore using Azure Automation to make that happen.

Dec 20 2016

Database Configuration

It’s amazing just how much the landscape changed with the release of SQL Server 2016 SP1. For example, I just found out that you can disable parameter sniffing at the database level using the database configuration. Not only does this work for SQL Server 2016 SP1, but it’s enabled for Azure SQL Database.

How Database Configuration Works

The syntax is very simple and documented here. So, if I want to disable parameter sniffing for a single database, I can do this:

ALTER DATABASE SCOPED CONFIGURATION SET PARAMETER_SNIFFING = OFF;

That’s it. Done. It works from within the database and doesn’t require rebooting or anything else. Changing this setting does flush the cache of all the execution plans for that database. No other actions are necessary. You can control parameter sniffing at the database level.

Finally, turning it back on, just modify the code above to set it to ON.

But Wait… Now How Much Would You Pay

There’s more. The database configuration supports a bunch of other database specific settings that are pretty interesting. You can now set the MAXDOP at the database level. That’s without any need for setting up Resource Governor. You can modify the cardinality estimation engine independently from the compatibility level. In addition, you can also control the implementation of the query optimizer hot fixes, again at the database level.

All this makes for pretty exciting modifications at the database level to affect how your queries behave. However, Database Configuration creates additional things you’re going to want to check when you’re seeing differences in behavior between databases running on different servers. For example, you may see differences in execution plans between two servers which because different database settings for the cardinality estimator exist.

Dec 12 2016

OPTIMIZE FOR Hints When Parameter Sniffing is Turned Off

While presenting recently and talking about dealing with bad Parameter Sniffing, I got the question; what happens to OPTIMIZE FOR hints when parameter sniffing is disabled?

This is my favorite kind of question because the answer is simple: I don’t know.

Parameter Sniffing

For those who don’t know, parameter sniffing is when SQL Server uses the precise values passed into a query as a parameter (this means stored procedures or prepared statements) to generate an execution plan from the statistics using the value from the parameter. Most of the time, parameter sniffing is either helping you, or is not hurting you. Sometimes, parameter sniffing turns bad and hurts you quite severely.

Usually, but not always, this is because you either have severely skewed data (some data is very different than the rest, lots of rows versus fewer or vice versa), or your statistics are out of date. The real issue is deciding how best to resolve the issue if you are dealing with bad parameter sniffing.

OPTIMIZE FOR

There are a bunch of ways to deal with parameter sniffing when it goes bad. All of them revolve around controlling what type of execution plan gets generated. The three most common methods for resolving bad parameter sniffing are, plan forcing using Query Store (currently available in Azure SQL Database and SQL Server 2016, I have written extensively on this topic), using a RECOMPILE hint to get a new plan every time, or, using the OPTIMIZE FOR hint to get a plan based on a specific value or on the average of values.

Using OPTIMIZE FOR is pretty straight forward. Here’s an example query that uses the OPTIMIZE FOR hint to force the optimizer to choose a particular execution plan:

CREATE PROC dbo.AddressByCity @City NVARCHAR(30)
AS
   SELECT a.AddressID,
      a.AddressLine1,
      a.AddressLine2,
      a.City,
      sp.Name AS StateProvinceName,
      a.PostalCode
   FROM Person.Address AS a
   JOIN Person.StateProvince AS sp
      ON a.StateProvinceID = sp.StateProvinceID
   WHERE a.City = @City 
OPTION(OPTIMIZE FOR(@City='Mentor'));

Any time this stored procedure is called and that query gets compiled, or even if the statement gets recompiled, it will use the value of ‘Mentor’ to look at the statistics and determine how many rows are likely to be returned. This lets me take control away from the parameter sniffing process.

Turning Off Parameter Sniffing Entirely

We can just completely eliminate parameter sniffing. There’s a traceflag that we can set:

DBCC TRACEON (4136,-1);

This doesn’t require a server reboot or anything. It’ll just stop using parameter sniffing… on the whole server. I strongly advocate against using this without very extensive testing to confirm that you’re not benefitting from parameter sniffing.

We can remove the hint from the query above. Let’s then turn off parameter sniffing. If I capture the execution plan and look at the SELECT properties, that’s where I would normally see the Compile Time values for parameters. However, as you can see here, I don’t have a Compile Time value, just a Run Time value:

NoSniffing

What Happens To The OPTIMIZE FOR Hint

Let’s recompile the stored procedure using the OPTIMIZE FOR hint so that we can see what happens. Then, I’m going to call the procedure, but I’m going to pass a value that would result in a different execution plan:

EXEC dbo.AddressByCity @City = N'London';

Now, let’s take a look at the properties:

Sniffing

You can see that I have both a compile time value, and a run time value.

Conclusion

Just because I have parameter sniffing disabled, the query hints do not stop working. If I used OPTIMIZE FOR UNKNOWN instead of OPTIMIZE FOR a value, the results would be the same as when I disable parameter sniffing (the optimizer just uses an average instead of specific values). However, the other hint still pushes the optimizer to use a particular value, effectively pushing it to still do parameter sniffing even though parameter sniffing has been disabled. One other point, I also tested using OPTIMIZE FOR through a Plan Guide even though parameter sniffing was disabled. This worked fine too.

Dec 05 2016

Presentations for SQL Server Beginners

#iwanttohelp

Tired from all those blog posts

For my final blog post in the #gettingstarted, #iwanttohelp series, I decided to curate a set of presentations from the PASS Virtual Chapters. This content is available online. It’s free. Most importantly for the person just getting started as a SQL Server data pro, it’s good. I’m going to marry each of the presentations with my eleven blog posts in this series.

  1. The Importance of a Full Backup in SQL Server
    For this one I’m going to recommend Tim Radney’s session Understanding SQL Server Backup and Restore. I know Tim personally and guarantee this is a good session.
  2. Why Is The Server Slow
    Jes Borland is a very close personal friend and an absolutely outstanding presenter (and person). She has a session all about getting you started on wait statistics Beyond CXPacket: Understanding Wait Statistics.
  3. SQL Server Backups Are a Business Decision
    Paul Randal is, well, Paul Randal. If you’re not familiar with his work, you really should be. He presented a session called Building the Right Backup Strategy that should coincide nicely with my blog post.
  4. A View Is Not A Table
    OK, I couldn’t find a single session talking about views, so I’m going to come at this from the side, just a little (and I might start working on a presentation on views & other structures). This session by Kenneth Ureña, who I’ve talked with several times, is about structures and performance. It should do the trick. Indexing Strategies and Good Physical Designs for Performance Tuning.
  5. Use The Correct Data Type
    Silent Killers Lurking in Your Schema by (yes, a good friend) Mickey Steuwe is absolutely going to cover data types, among other things.
  6. Choosing the Right SQL Server Edition
    John Martin (yeah, another friend, what can I say, I try to stay on the good side of smart & capable people) has a presentation that’s a bit more focused on SQL Server internals but will absolutely address the edition of SQL Server as part of the work. Get SQL Server Set Up Right! The First Time.
    I’m also going to add a session by Jes Borland, again, that introduces Azure, especially talking about Azure in a hybrid scenario. Azure and SQL Server. Plus, I just want to share more Azure links. If you’re just getting started with the Microsoft Data Platform, start with Azure.
  7. Monitor Query Performance
    There are tons of choices here. I’m going with Kicking and Screaming: Replacing Profiler With Extended Events because I know that most people learn Profiler first. I also know that we need to get people to start using Extended Events instead. I will further add that Erin Stellato really knows here stuff on this topic, as well as many others. She is also a wonderful presenter (and a friend).
  8. Azure SQL Database For Your First Database
    As I said, I really believe in Azure as an entry into the land of Data Platform. It’s easy. It’s inexpensive. It’s available almost anywhere. Tim Radney has a session called Azure SQL Database for the Production DBA that will help to get you started.
  9. The Clustered Index is Vital To Your Database Design
    The clustered index is so important that everyone presents on it in one fashion or another. I’ve met Ayman El-Ghazali a few times. We’re not friends yet, but we get along (he’s very smart, see above). His session Indexing Fundamentals seems ideal for this slot.
  10. Statistics Are Vital For Performance
    Erin Stellato has a session called Statistics Starters. It is so important to understand what statistics are within SQL Server, how to use them, how to maintain them, it’s hard to over-emphasize it.
  11. PowerShell To Test a Query
    If you’re just getting going with Powershell, you’re going to need tons of help. Luckily it’s out there. I could have picked any number of excellent sessions by amazing people (many of them friends). I’m going with the PowerShell Tips and Tricks for SQL Server Administration by Mike Fal. Mike (a friend) recently helped me with some of my own PowerShell scripts, so he’s kind of top of mind at the moment.

Conclusion

For my final post I wanted to do three things. First, a recap of the last year’s worth of #getttingstarted blog posts. Believe it or not, I put a little work into each of these and it’s nice to see them all collected like this. Second, I wanted to illustrate the vast quantity of high quality learning that is available through PASS and the PASS Virtual Chapters. I pulled these sessions from a bunch of different chapters. There are alternatives I could have picked for every topic (except for views, something to work on). Which leads to my third, and final reason for doing my last post in the series like this, I wanted to keep helping. You now know a great resource to go to and search through for more information and learning.

Before we go, I want launch a small challenge of my own. If you’re a blogger, pick a topic (or a collection of topics like I did), and put together a blog post that curates the content using only PASS resources. Link back to this blog. Use the hash tag, #PASScurated. Let’s see what other information can be put together.

 

 

Nov 09 2016

Azure Data Platform Learning Resources

I know that more and more of you are moving more and more resources into the Azure Data Platform. In an effort to help you learn about all that Azure offers I created a GitHub repository of blogs, teachers, courses, etc. It’s open to the community to update. New material has been recently added and there will be more and more all the time. Check it out. If you know of resources that should be listed there, please issue a branch pull and add them right in. This is your resource.

Sep 30 2016

Azure Data Platform Resources

A few months ago I created a GitHub repository for the purpose of building and maintaining a list of resources for Azure Data Platform training. You can see it here. My goal in putting this into GitHub instead of just running it on my blog is to make it a community resource. I want all of you to maintain it. If you’re teaching a class (one hour or one week, I don’t care), I’d like you to add yourself to the instructors list. If you have a blog where you post Azure content, please, add your blog. Are you a PowerBI monster? Get on the list. Please, help me create and grow this list so that people have a central, public, resource for this information. More and more of you are moving into Azure all the time. You need to know where to go to gain the knowledge necessary to be successful on the Azure Data Platform. That’s what the Azure Data Platform Learning Resources repository is all about.

Click the link. Edit a page. Do it!

Sep 12 2016

The Clustered Index Is Vital To Your Database Design

This is post #9 supporting  Tim Ford’s (b|t) initiative on #iwanttohelp, #entrylevel. Read about it here.

You get one clustered index per table.

That bears repeating, you get one clustered index per table.

Choosing a clustered index is an extremely important and fundamental aspect of all your SQL Server work. The one clustered index that you get determines how the data in your table is stored. Because the clustered index determines how your data is stored, it also determines how your data is retrieved. Much of SQL Server is engineered around the clustered index because it is such a foundational object for the rest of all behavior. Without a clustered index, the data in your table is stored in what is called a heap. It is essentially a pile, a heap, of data, stored without internal structure. The data in a heap can be found fairly quickly when only a single value is needed, but as soon as you start to retrieve ranges of data, more than one row, heaps become extremely problematic. Meanwhile, the clustered index stores all your data at the bottom of a B+Tree, structured storage, that is used to retrieve this data through the key, or keys, that you give your clustered index. Further, each non-clustered index also uses the key value from your clustered index as it’s means of retrieving data from the clustered index.

All this means that the vast majority of your tables should have a clustered index. Yes, there are exceptions, but, exceptions should be exceptional.

The standard advice, and it’s not wrong, is to make the clustered key as narrow as possible. Next, clustered indexes are designed to support monotonically increasing values extremely well. Clustered indexes must be unique. You either have to create them on a unique value, or, internally you’ll get a UNIQUIFIER applied to the key to ensure that each value is unique. All this taken together means that they work nicely on IDENTITY columns. Because of this advice, you’ll frequently see all the clustered indexes in a database on the IDENTITY columns. However, this is a bit of a trap. You need to remember that the clustered index is not just about storage, but about retrieval. If the most common access path to your data is through, for example, a foreign key, that value might be a better choice for the clustered key. If you always retrieve the data through a date range, that might be a better choice for the clustered key. Keep in mind the standard advice for clustered keys; narrow, monotonically increase, unique. However, also remember, they are where your data is stored. It is frequently better to sacrifice one or more of the standard practices in order to be able to use your clustered index to retrieve the data in most of your queries.

Think through where and how to apply your clustered indexes, and don’t get stuck into using them in a single manner, everywhere, within your system. You get one clustered index on a table. Be sure you put it somewhere that it will work well and somewhere that it will be well used, both.

Aug 29 2016

Azure SQL Data Warehouse Execution Plans

Azure SQL Data Warehouse can sometimes feel like it’s completely different from SQL Server, but under the covers, it is still (mostly) SQL Server and it does have execution plans. Let’s take a look at one.

I’ve created a new SQL Data Warehouse using the sample database available on the portal, AdventureWorksDW. Here’s a query against that database:

SELECT dd.FullDateAlternateKey AS OrderDate,
dc.LastName,
SUM(fis.SalesAmount) AS SumSalesAmount
FROM dbo.FactInternetSales AS fis
JOIN dbo.DimDate AS dd
ON fis.OrderDateKey = dd.DateKey
JOIN dbo.DimCustomer AS dc
ON dc.CustomerKey = fis.CustomerKey
GROUP BY dd.FullDateAlternateKey,
dc.LastName
HAVING SUM(fis.SalesAmount) > 5000.0
ORDER BY OrderDate DESC;

If I attempt to capture an execution plan using the SQL Server Management Studio GUI, nothing happens. If I try to use T-SQL commands, I get an error that those commands are not supported with this version of SQL Server. Same thing if I try to capture a plan using Visual Studio. So… now what? Enter our new command:

EXPLAIN
SELECT  dd.FullDateAlternateKey AS OrderDate,
        dc.LastName,
        SUM(fis.SalesAmount) AS SumSalesAmount
FROM    dbo.FactInternetSales AS fis
JOIN    dbo.DimDate AS dd
        ON fis.OrderDateKey = dd.DateKey
JOIN    dbo.DimCustomer AS dc
        ON dc.CustomerKey = fis.CustomerKey
GROUP BY dd.FullDateAlternateKey,
        dc.LastName
HAVING  SUM(fis.SalesAmount) > 5000.0
ORDER BY OrderDate DESC;

If I run this through the SSMS query window, I get a syntax error. So we’re now in Visual Studio. This is how we generate an execution plan from within Azure SQL Data Warehouse. What you get is XML output in the results like this:

xmlresults

According to the documentation on EXPLAIN, I should be able to click on the XML and it will open up to explore. In my version of Visual Studio (2015), I didn’t find that to be the case. Instead I had to copy and paste the XML into an XML file window that I created within Visual Studio. This is what I finished with:

<?xml version="1.0" encoding="utf-8"?>
<dsql_query number_nodes="1" number_distributions="60" number_distributions_per_node="60">
  <sql>SELECT  dd.FullDateAlternateKey AS OrderDate,          dc.LastName,          SUM(fis.SalesAmount) AS SumSalesAmount  FROM    dbo.FactInternetSales AS fis  JOIN    dbo.DimDate AS dd          ON fis.OrderDateKey = dd.DateKey  JOIN    dbo.DimCustomer AS dc          ON dc.CustomerKey = fis.CustomerKey  GROUP BY dd.FullDateAlternateKey,          dc.LastName  HAVING  SUM(fis.SalesAmount) &gt; 5000.0  ORDER BY OrderDate DESC</sql>
  <dsql_operations total_cost="5.98868068474576" total_number_operations="13">
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_14</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>
        <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_14] ([DateKey] INT NOT NULL, [FullDateAlternateKey] DATE NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="BROADCAST_MOVE">
      <operation_cost cost="1.99584" accumulative_cost="1.99584" average_rowsize="7" output_rows="1188" GroupNumber="12" />
      <source_statement>SELECT [T1_1].[DateKey] AS [DateKey],         [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey]  FROM   [DWTest].[dbo].[DimDate] AS T1_1</source_statement>
      <destination_table>[TEMP_ID_14]</destination_table>
    </dsql_operation>
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_15</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllDistributions" />
      <sql_operations>
        <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_15] ([OrderDateKey] INT NOT NULL, [CustomerKey] INT NOT NULL, [SalesAmount] MONEY NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="SHUFFLE_MOVE">
      <operation_cost cost="3.93098847457627" accumulative_cost="5.92682847457627" average_rowsize="16" output_rows="60398" GroupNumber="10" />
      <source_statement>SELECT [T1_1].[OrderDateKey] AS [OrderDateKey],         [T1_1].[CustomerKey] AS [CustomerKey],         [T1_1].[SalesAmount] AS [SalesAmount]  FROM   [DWTest].[dbo].[FactInternetSales] AS T1_1</source_statement>
      <destination_table>[TEMP_ID_15]</destination_table>
      <shuffle_columns>CustomerKey;</shuffle_columns>
    </dsql_operation>
    <dsql_operation operation_type="RND_ID">
      <identifier>TEMP_ID_16</identifier>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllDistributions" />
      <sql_operations>
        <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_16] ([FullDateAlternateKey] DATE NOT NULL, [LastName] NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS, [col] MONEY NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="SHUFFLE_MOVE">
      <operation_cost cost="0.0618522101694915" accumulative_cost="5.98868068474576" average_rowsize="111" output_rows="136.985" GroupNumber="25" />
      <source_statement>SELECT [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey],         [T1_1].[LastName] AS [LastName],         [T1_1].[col] AS [col]  FROM   (SELECT   SUM([T2_2].[SalesAmount]) AS [col],                   [T2_1].[FullDateAlternateKey] AS [FullDateAlternateKey],                   [T2_2].[LastName] AS [LastName]          FROM     [tempdb].[dbo].[TEMP_ID_14] AS T2_1                   INNER JOIN                   (SELECT [T3_2].[OrderDateKey] AS [OrderDateKey],                           [T3_2].[SalesAmount] AS [SalesAmount],                           [T3_1].[LastName] AS [LastName]                    FROM   [DWTest].[dbo].[DimCustomer] AS T3_1                           INNER JOIN                           [tempdb].[dbo].[TEMP_ID_15] AS T3_2                           ON ([T3_2].[CustomerKey] = [T3_1].[CustomerKey])) AS T2_2                   ON ([T2_1].[DateKey] = [T2_2].[OrderDateKey])          GROUP BY [T2_1].[FullDateAlternateKey], [T2_2].[LastName]) AS T1_1</source_statement>
      <destination_table>[TEMP_ID_16]</destination_table>
      <shuffle_columns>FullDateAlternateKey;</shuffle_columns>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllDistributions" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_15]</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllComputeNodes" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_14]</sql_operation>
      </sql_operations>
    </dsql_operation>
    <dsql_operation operation_type="RETURN">
      <location distribution="AllDistributions" />
      <select>SELECT   [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey],           [T1_1].[LastName] AS [LastName],           [T1_1].[col] AS [col]  FROM     (SELECT [T2_1].[FullDateAlternateKey] AS [FullDateAlternateKey],                   [T2_1].[LastName] AS [LastName],                   [T2_1].[col] AS [col]            FROM   (SELECT   SUM([T3_1].[col]) AS [col],                             [T3_1].[FullDateAlternateKey] AS [FullDateAlternateKey],                             [T3_1].[LastName] AS [LastName]                    FROM     [tempdb].[dbo].[TEMP_ID_16] AS T3_1                    GROUP BY [T3_1].[FullDateAlternateKey], [T3_1].[LastName]) AS T2_1            WHERE  ([T2_1].[col] &gt; CAST ((5000.0) AS DECIMAL (5, 1)))) AS T1_1  ORDER BY [T1_1].[FullDateAlternateKey] DESC</select>
    </dsql_operation>
    <dsql_operation operation_type="ON">
      <location permanent="false" distribution="AllDistributions" />
      <sql_operations>
        <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_16]</sql_operation>
      </sql_operations>
    </dsql_operation>
  </dsql_operations>
</dsql_query>

So now we just save this as a .sqlplan file and open it in SSMS, right?

Nope!

See, that’s not a regular execution plan, at all. Instead, it’s a D-SQL plan. It’s not the same as our old execution plans. You can’t open it as a graphical plan (and no, not even in that very popular 3rd party tool, I tried). You will have to learn how to read these plans differently because, well, they are different.

Aug 17 2016

Updates to Azure Data Platform Learning Resources

I’m acting as the curator to a list of Azure Data Platform Learning Resources. It’s speakers and teachers, blogs, articles and more, all focused around the Azure Data Platform. I have it hosted on GitHub so that anyone can contribute. There have been several updates recently that you may want to take a look at.

If you’re looking for where and how to get started in the Azure Data Platform, this is a great resource to get you going.

If you’re teaching the Azure Data Platform, anything from PowerBI to DocumentDB to Azure SQL Data Warehouse and all the stuff in between, please add your information so that this list is up to date.

Aug 15 2016

Query Store, Force Plan and Dropped Objects

I love the Query Store. Seriously. It’s a huge leap forward in the capabilities of Azure SQL Database and SQL Server in support of performance monitoring and query optimization. One of my favorite aspects of the Query Store is the ability to force plans. Frankly though, it’s also the scariest part of the Query Store. I do believe that plan forcing will be one of the most ill-used functions in SQL Server since the multi-statement table-valued user-defined function (don’t get me started). However, unlike the UDF, this ill-use will be because of poor understanding on the part of the user, not a fundamental design issue. No, plan forcing and the Query Store are very well constructed. Let me give you an example of just how well constructed they are.

Let’s imagine that have a situation such as bad parameter sniffing where you’ve determined that from the more than one possible execution plans against a table, there is a preferred plan. Enabling plan forcing to ensure that plan gets used is a no-brainer. Let’s further imagine that you have a junior DBA who is… let’s just say overly aggressive in their duties such that they do silly things occasionally. What happens when your pretty plan, which uses a particular index meets your junior DBA who just dropped that index?

Here’s the setup. We’re using the WideWorldImporters database and we have this query:

SELECT  *
FROM    Warehouse.StockItemTransactions AS sit
WHERE   sit.TransactionOccurredWhen BETWEEN '9/9/2015'
                                    AND     '9/11/2015';

This query, with the default configuration, will scan the existing table, so I’ll add an index:

CREATE INDEX TransactionOccurredWhenNCI
ON Warehouse.StockItemTransactions
(TransactionOccurredWhen);

For a limited range such as the one I’m passing above, I’ll get a plan with a key lookup operation which runs faster than the scan, so I’m happy. For a broader range, I’m likely to see a scan again, but since most of my queries have a very narrow range, I’d sure like to be able to force the plan to always compile to the seek and key lookup. To do this I need to find the query_id and plan_id from the Query Store (assuming I’m not using the GUI):

SELECT  qsp.plan_id,
        qsp.query_id,
		qsqt.query_sql_text,
		qsp.count_compiles
FROM    sys.query_store_plan AS qsp
JOIN    sys.query_store_query AS qsq
        ON qsq.query_id = qsp.query_id
JOIN    sys.query_store_query_text AS qsqt
        ON qsqt.query_text_id = qsq.query_text_id
WHERE   qsqt.query_sql_text LIKE 'SELECT  *
FROM    Warehouse.StockItemTransactions AS sit%';

With those values, I can force the execution plan so that it will always use the plan I want:

EXEC sys.sp_query_store_force_plan 42460,463;

That’s it. I’m happy because I’m going to see the execution plan used over and over, despite any values passed during a recompile.

Then…

Along comes our aggressive junior DBA who decides that there are “too many” indexes on the server. No, I don’t know what that means either, but they evidently read it on the internet or something so they drop the index we created before:

DROP INDEX TransactionOccurredWhenNCI ON Warehouse.StockItemTransactions;

What now happens to our lovely execution plan and the plan forcing? We’ll take a look at two events in Extended Events, sql_statement_recompile and query_store_plan_forcing_failed. Nothing happens immediately on dropping the index. The plans associated with that object, if any, are marked as invalid in the cache. The next time we call the query it’s going to recompile and we can see the event:

recompile_event

The most important part of the event is the recompile_cause which is “Schema changed”. However, I would also note the attach_activity_id.guid. I’ve chosen to enable causality tracking in this Extended Event session. This will cause all events associated with a common activity to get a GUID and then a sequence. This is interested because, after the recompile event, we get the query_store_plan_forcing_failed event:

plan_forcing_failed

The guid value is the same as the event above and the *.seq number is now 2, showing that, for these events, the recompile event occurred and then this event occurred. That makes perfect sense. The plan is marked for recompile, so, it’s going to be recompiled. I have enabled plan forcing though, so I have a particular plan that I want the optimizer to use. However, thanks to my “helpful” junior DBA, the plan is now invalid. You even get the description of what happened in the message field for the event:

Index ‘WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhenNCI’, specified in the USE PLAN hint, does not exist. Specify an existing index, or create an index with the specified name.

The first question now is, what happens with this query and the execution plan? Does the new plan generated now that the index is missing get stored in cache? Yes, it does. We can validate that by querying the cache, or, when capturing the actual execution plan, checking the “Retrieved from cache” property.

Because plan forcing is enabled, do we see a recompile every time this query is called? The answer to that question is slightly complex. Under normal circumstances, no. As long as that plan remains in cache, it’s simply reused. No other recompiles occur. A normal recompile event will cause another attempt at applying the invalid execution plan and we would see yet another query_store_plan_forcing_failed event for each recompile on the query. However, during testing, Joey D’Antoni (who was helping me play with this when we discussed what would happen when a plan was made invalid) had severe memory pressure on his server. He saw intermittent recompiles with a cause message that said plan forcing had failed. So if your server is under extreme stress and you cause this issue, you might see different messages. Just remember, the cause of the recompiles was not the plan forcing, but the memory pressure.

The fun thing is, as long as I don’t remove the plan forcing or take the query and plan out of the Query Store manually, if I recreate the index on my table with the same name and definition as that expected by the plan, the Query Store will simply reapply the plan and then successfully force it during any subsequent recompile situation. This is because Query Store is persisted with the database and barring outside activity, the information there will remain, just like the rest of the data in the database.

All of this means that Query Store works exactly the way we would expect, not forcing additional recompiles when you, or your junior DBA, inadvertently invalidate a plan. It also works as expected in that forcing a plan is stored with your database so that, assuming you don’t remove that plan from the Query Store, it will simply be reapplied after you fix the problem. It’s fun to see the thought that went behind the design of the behavior of Query Store. However, please, use plan forcing judiciously.