The power and capabilities of Azure SQL Data Warehouse are pretty amazing. However, it’s still basically SQL Server under the covers. SQL Server still needs maintenance and one type of maintenance is keeping statistics up to date. Statistics maintenance is doubly important within Azure SQL Data Warehouse because statistics are not created automatically, nor are they maintained automatically. It’s up to you to ensure your statistics are up to date.
Statistics in Azure SQL Data Warehouse
It’s worth re-emphasizing the point that there is no automatic creation of statistics in Azure SQL Data Warehouse. This means you need to plan to add statistics to your tables. Microsoft recommends an easy method is to add them to every column. While that does make things easy, it’s not like statistics are free. If you add them to every column, you’ll also have to maintain them, on every column. That could be costly. I’d rather see a targeted approach, but that will require additional work and a good understanding of how the Data Warehouse is being used. That’s not always possible, hence Microsoft’s suggestion. Your methods of data loading to your Warehouse are also going to drive your statistics maintenance routines. It’s a complex dance that goes beyond the scope of this blog post. To read more about what Microsoft has to say about statistics in Azure SQL Data Warehouse, read this article.
Because we’ve moved into a Platform as a Service (PaaS) offering with Azure SQL Data Warehouse, we don’t have the types of automation mechanisms we’re accustomed to. By that I mean, there is no SQL Agent. Instead, we have Azure Automation:
I have an Azure SQL Data Warehouse I’m using for teaching this topic while on SQL Cruise. Let’s go ahead and add an automation account:
NOTE: The most important habit you can start with in Azure is putting everything into discrete, planned, Resource Groups. These make management so much easier.
Once the account is set, the first thing you need is to create a Runbook. There is a collection of them for your use within Azure. None of them are immediately applicable for what I need. I’m just writing a really simple Powershell script to do what I want:
$SqlUsername = 'Grant'
$SqlPwd = 'xxx'
# Define the connection to the SQL Database
$Conn = New-Object System.Data.SqlClient.SqlConnection("Server=tcp:cruise2017.database.windows.net;Database=Cruisers;User ID=$SqlUsername;Password=$SqlPwd;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;")
# Open the SQL connection
# Define the SQL command to run. In this case we are getting the number of rows in the table
$Cmd=new-object system.Data.SqlClient.SqlCommand("UPDATE STATISTICS dbo.DimReseller", $Conn)
# Execute the SQL command
# Close the SQL connection
Nothing to it. Yes, for a production system you’ll want to be able to loop through all your tables, pick and choose which statistics get updated, etc. I’ll leave that to you to figure out for your system. It’s straight forward stuff.
NOTE: Thanks to Denny Cherry and Mike Fal for some pointers on troubleshooting setting this up. There’s a Test Pane that you need to make liberal use of along with Write-Output commands in order to identify where things are going south.
With the Runbook complete, save it and Publish it:
Once I’ve published my Runbook, I can simply click on the Schedule button in order to decide when I want it to run:
Obviously you can pretty easily set up the schedule in all the standard methods available to you in SQL Agent. Once this is done, you’re all set. You’ve just successfully created a schedule for updating statistics in Azure SQL Data Warehouse.
While a long discussion can be had around when and where you need to create statistics inside Azure SQL Data Warehouse, there is no doubt that you will need to update those statistics. It can be just simply a part of your load process. However, if you have a trickle load, or nightly updates, you’ll want to explore using Azure Automation to make that happen.