Jan 23 2017

Data About Execution Plans

If you look at the Properties for the first operator of a graphical execution plan, you get all sorts of great information. I’ve talked about the data available there and how important it is in this older post. Checking out the properties of a plan you’re working on is a fundamental part of tuning that plan. What happens when you don’t know which plan you should be working on? What do you do, for example, if you want to see all the plans that are currently using ARITHABORT=FALSE or some other plan affecting setting?

The “easy” answer to this question is to run an XQuery against the XML of the query plan itself. You can identify these properties and retrieve the appropriate values from within the plan. However, XQuery consumes quite a bit of resources and you might not want to run this on a production system that’s already under stress. Now what?

sys.dm_exec_plan_attributes

There is a DMV that isn’t used a lot of the time because the information within it frequently doesn’t have a lot of bearing on solving fundamental query tuning issues such as out of date statistics, bad or missing indexes, or poorly structured T-SQL. This DMV, sys.dm_exec_plan_attributes, contains a bunch of values that are used by the optimizer to identify a plan in cache, such as object_id (if any), database_id, and compatibility level (compat_level). In addition to these clear & easy to understand attributes, there’s one more, set_options, that’s not immediately clear.

set_options

Follow the link about and you’ll find that the set_options column is a bitmask. It contains a number of settings within a single value. I won’t argue that this is a good (or bad) design. That’s what it is. The question is, how do we use it? Here’s a simple query that shows all the queries that have ANSI_WARNINGS set to true:

SELECT detqp.query_plan,
       depa.attribute,
       depa.value
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_text_query_plan(
                                          deqs.plan_handle,
                                          deqs.statement_start_offset,
                                          deqs.statement_end_offset
                                       ) AS detqp
CROSS APPLY sys.dm_exec_plan_attributes(deqs.plan_handle) AS depa
WHERE depa.attribute = 'set_options'
      AND (CAST(depa.value AS INT) & 16) = 16;

If you were looking for queries that didn’t have ANSI_WARNINGS, you could just change the value to 0. Use the values from the documentation link above to look at the various settings based on their bit values.

NOTE: One of the values is ‘Parallel’. When I was investigating this, I became very excited that this would be a way to programmatically identify parallel execution plans. However, it’s an attribute, like that others, that determines how a plan can be, not is, compiled. So looking at the parallel value here would just mean that a given plan could be parallel, not that it is.

Conclusion

You don’t want to be completely dependent on the query plan when it comes to investigation and identifying queries with problems. Instead, you want to be systematic in the approach. Using sys.dm_exec_plan_attributes, you can query for information about your queries.

 

Jan 18 2017

Scheduling Statistics Maintenance in Azure SQL Data Warehouse

The power and capabilities of Azure SQL Data Warehouse are pretty amazing. However, it’s still basically SQL Server under the covers. SQL Server still needs maintenance and one type of maintenance is keeping statistics up to date. Statistics maintenance is doubly important within Azure SQL Data Warehouse because statistics are not created automatically, nor are they maintained automatically. It’s up to you to ensure your statistics are up to date.

Statistics in Azure SQL Data Warehouse

It’s worth re-emphasizing the point that there is no automatic creation of statistics in Azure SQL Data Warehouse. This means you need to plan to add statistics to your tables. Microsoft recommends an easy method is to add them to every column. While that does make things easy, it’s not like statistics are free. If you add them to every column, you’ll also have to maintain them, on every column. That could be costly. I’d rather see a targeted approach, but that will require additional work and a good understanding of how the Data Warehouse is being used. That’s not always possible, hence Microsoft’s suggestion. Your methods of data loading to your Warehouse are also going to drive your statistics maintenance routines. It’s a complex dance that goes beyond the scope of this blog post. To read more about what Microsoft has to say about statistics in Azure SQL Data Warehouse, read this article.

Automating Maintenance

Because we’ve moved into a Platform as a Service (PaaS) offering with Azure SQL Data Warehouse, we don’t have the types of automation mechanisms we’re accustomed to. By that I mean, there is no SQL Agent. Instead, we have Azure Automation:

AzureAutomation

I have an Azure SQL Data Warehouse I’m using for teaching this topic while on SQL Cruise. Let’s go ahead and add an automation account:

CruiseAutomation

NOTE: The most important habit you can start with in Azure is putting everything into discrete, planned, Resource Groups. These make management so much easier.

Once the account is set, the first thing you need is to create a Runbook. There is a collection of them for your use within Azure. None of them are immediately applicable for what I need. I’m just writing a really simple Powershell script to do what I want:

$SqlUsername = 'Grant'
$SqlPwd = 'xxx'

# Define the connection to the SQL Database
$Conn = New-Object System.Data.SqlClient.SqlConnection("Server=tcp:cruise2017.database.windows.net;Database=Cruisers;User ID=$SqlUsername;Password=$SqlPwd;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;")
        
# Open the SQL connection
$Conn.Open()

# Define the SQL command to run. In this case we are getting the number of rows in the table
$Cmd=new-object system.Data.SqlClient.SqlCommand("UPDATE STATISTICS dbo.DimReseller", $Conn)

# Execute the SQL command
$Cmd.ExecuteNonQuery()

# Close the SQL connection
$Conn.Close()

Nothing to it. Yes, for a production system you’ll want to be able to loop through all your tables, pick and choose which statistics get updated, etc. I’ll leave that to you to figure out for your system. It’s straight forward stuff.

NOTE: Thanks to Denny Cherry and Mike Fal for some pointers on troubleshooting setting this up. There’s a Test Pane that you need to make liberal use of along with Write-Output commands in order to identify where things are going south.

With the Runbook complete, save it and Publish it:

EditRunbook

Scheduling Automation

Once I’ve published my Runbook, I can simply click on the Schedule button in order to decide when I want it to run:

schedule

Obviously you can pretty easily set up the schedule in all the standard methods available to you in SQL Agent. Once this is done, you’re all set. You’ve just successfully created a schedule for updating statistics in Azure SQL Data Warehouse.

Conclusion

While a long discussion can be had around when and where you need to create statistics inside Azure SQL Data Warehouse, there is no doubt that you will need to update those statistics. It can be just simply a part of your load process. However, if you have a trickle load, or nightly updates, you’ll want to explore using Azure Automation to make that happen.

Jan 10 2017

Query Optimizer and Data Definition Language Queries

Data Definition Language queries don’t go through the optimizer, right? While normally, my short answer to this question in the past would have been, yes. However, with testing comes knowledge and I want to be sure about the statement.

I’m working with a team of people to completely rewrite the SQL Server Execution Plans book. We’ll probably be published in April or May. It’s going to be amazing. The structure will be completely different and the book will represent five years of additional knowledge in how execution plans work and how to read and interpret them since the last book was written. However, enough on that. Let’s answer the question about Data Definition Language.

First of all, we need to quickly define our terms. Data Definition Language (DDL) represents the syntax for queries that build your databases and the data structures within them. As a result, if you read the documentation at MSDN, you’ll see that this includes CREATE, ALTER, DROP, ENABLE/DISABLE TRIGGER, TRUNCATE TABLE and UPDATE STATISTICS. Consequently, these command are separated from the Data Manipulation Language (DML) queries that are used to SELECT, UPDATE, DELETE and INSERT your data (amongst other things).

Testing the Data Definition Language Query

Here is an example of one of the simplest possible DDL queries:

CREATE TABLE dbo.MyNewTable (  MyNewTableID INT PRIMARY KEY IDENTITY(1, 1),
                               MyNewValue NVARCHAR(50)
                            );
GO

We can check whether or not you’re going to get an execution plan for this in two easy ways. You can capture the estimated plan, or run the query and capture the actual plan. Since most of the time, it’s preferable to work with the added runtime information that an actual plan provides, let’s start there. However, if you execute the query above and try to capture an actual plan, you will not get one. So, let’s capture the estimated plan instead. Here’s what we see:

estimatedplan

Not much of a plan really. The devil is in the details though, so let’s open the Properties page (always use the Properties page, the ToolTip is just too unreliable in the amount of information displayed):

Data Definition Language properties

That is ALL the properties that are available. The implication is simple, SQL Server doesn’t create execution plans for DDL statements.

Further, the lack of a plan suggests that these queries must not go through the query optimization process. We can check this. There is a Dynamic Management View (DMV), sys.dm_exec_query_optimizer_info, that shows aggregate values for the work done by the optimizer. It’s a little tricky to capture the metrics of a single query, but it is possible (NOTE: don’t run this on production, please):

DBCC FREEPROCCACHE();
GO

SELECT *
INTO OpInfoAfter
FROM sys.dm_exec_query_optimizer_info AS deqoi;
GO

DROP TABLE OpInfoAfter;
GO

--gather the existing optimizer information
SELECT *
INTO OpInfoBefore
FROM sys.dm_exec_query_optimizer_info AS deqoi;
GO

--run a query
CREATE TABLE dbo.MyNewTable (  MyNewTableID INT PRIMARY KEY IDENTITY(1, 1),
                               MyNewValue NVARCHAR(50)
                            );
GO

SELECT *
INTO OpInfoAfter
FROM sys.dm_exec_query_optimizer_info AS deqoi;
GO

--display the data that has changed
SELECT oia.counter,
       (oia.occurrence - oib.occurrence) AS ActualOccurence,
       (oia.occurrence * oia.value - oib.occurrence * oib.value) AS ActualValue
FROM OpInfoBefore AS oib
JOIN OpInfoAfter AS oia
   ON oib.counter = oia.counter
WHERE oia.occurrence <> oib.occurrence;
GO

DROP TABLE OpInfoBefore;
DROP TABLE OpInfoAfter;
GO

If we run this, we won’t see any values. This is because this CREATE TABLE statement doesn’t go through the optimizer. There’s no optimization possible, so you won’t see an execution plan of any kind.

Does this mean that my first answer is correct then? Does the optimizer skip making execution plans for DDL statements?

Changing the Data Definition Language Query

Let’s modify the query so that we’re doing an ALTER table instead of a CREATE:

ALTER TABLE Sales.Customer  WITH CHECK ADD  CONSTRAINT SomeTest FOREIGN KEY(CustomerID)
REFERENCES [dbo].[Agent] ([AgentId]);

If we check the Estimated and Actual plans using the same methods above, we don’t get an execution plan (well, the estimated plan is the same T-SQL place holder that we saw). What happens if we see what the optimizer did? This time, we get results:

counter ActualOccurence ActualValue
elapsed time 2 0
final cost 2 0.139429282
insert stmt 1 1
maximum DOP 2 0
optimizations 2 2
search 1 2 2
search 1 tasks 2 117
search 1 time 2 0
tables 2 2
tasks 2 117
view reference 1 1

I’ll tell you right now, I don’t know what this represents. I suspect I’d need to run the debugger to see it. Maybe it’s a plan for the CHECK process of the check constraint. The optimizer is involved in this process, twice. Two different statements were optimized. One involved inserting data and one involved referencing a view. However, at the end of the process, we still don’t see an execution plan displayed.

How About Extended Events

Can we capture the events to see what’s happening? We can try. I’ll spin up a Session and capture the following events:

query_post_execution_showplan
sql_statement_completed
sp_statement_completed

In addition, I’ll enable event correlation so that we can see the order in which events occurred. Now, I’ll try running the ALTER TABLE command to see what we can find out:

name statement attach_activity_id.guid attach_activity_id.seq
query_post_execution_showplan NULL 65A0A74B-E5D5-460C-846D-87808C334283 1
sp_statement_completed SELECT * FROM [Sales].[Customer] 65A0A74B-E5D5-460C-846D-87808C334283 2
sql_statement_completed ALTER TABLE Sales.Customer  WITH CHECK ADD  CONSTRAINT SomeTest FOREIGN KEY(CustomerID)  REFERENCES [dbo].[Agent] ([AgentId]) 65A0A74B-E5D5-460C-846D-87808C334283 3

And there we have it. We can see that there was a plan compiled, first, then a procedure call was made with a single statement: SELECT * FROM Sales.Customer. Finally our SQL Batch statement that ran the ALTER TABLE was created. Everything is grouped by the activity GUID and we have them in the sequence determined by attach_activity_id.seq. Here’s the execution plan:

ConstraintPlan

In short, there was a plan generated here, but it is, as I guessed at, the plan for the check constraint process. There is no plan for the Data Definition Language statement.

Conclusion

Just saying something doesn’t make it true. You have to be able to prove it. From every evidence that Microsoft supplies us, the DDL statements do not generate an execution plan or go through the query optimization process. Because we tested this theory by attempting to create a plan, by observation of the optimizer through the sys.dm_exec_query_optimizer_info DMV, and through Extended Events, we can therefore safely say that none of the statements tested created any kind of plan for the DDL statement.

Have I tested every variation on every possible DDL statement? No. Is it possible that other statements could have execution plans associated with them? I won’t say that’s impossible, because nothing is impossible. Therefore, I’ll simply say, that’s unlikely.

Dec 20 2016

Database Configuration

It’s amazing just how much the landscape changed with the release of SQL Server 2016 SP1. For example, I just found out that you can disable parameter sniffing at the database level using the database configuration. Not only does this work for SQL Server 2016 SP1, but it’s enabled for Azure SQL Database.

How Database Configuration Works

The syntax is very simple and documented here. So, if I want to disable parameter sniffing for a single database, I can do this:

ALTER DATABASE SCOPED CONFIGURATION SET PARAMETER_SNIFFING = OFF;

That’s it. Done. It works from within the database and doesn’t require rebooting or anything else. Changing this setting does flush the cache of all the execution plans for that database. No other actions are necessary. You can control parameter sniffing at the database level.

Finally, turning it back on, just modify the code above to set it to ON.

But Wait… Now How Much Would You Pay

There’s more. The database configuration supports a bunch of other database specific settings that are pretty interesting. You can now set the MAXDOP at the database level. That’s without any need for setting up Resource Governor. You can modify the cardinality estimation engine independently from the compatibility level. In addition, you can also control the implementation of the query optimizer hot fixes, again at the database level.

All this makes for pretty exciting modifications at the database level to affect how your queries behave. However, Database Configuration creates additional things you’re going to want to check when you’re seeing differences in behavior between databases running on different servers. For example, you may see differences in execution plans between two servers which because different database settings for the cardinality estimator exist.

Dec 19 2016

PASS Board: Year-End Wrap-Up

This month concludes my second full year on the PASS Board of Directors and my first full year on the Executive Committee. This year has been exciting, challenging, educational, and, I truly believe, useful to the organization. We’ve accomplished a lot. We will do more.

Year In Review

I have two primary responsibilities as the Executive Vice President – Finance and Governance. Firstly, I am responsible for PASS’ budget.  Secondly, I am responsible to oversee the governance of PASS. Here’s how the year went on these two topics.

Finance

At the beginning of the year, our primary focus is to ensure that the various directors in charge of different portfolios are getting ready for the annual budget. Our fiscal year runs from July to June. A significant amount of work is spent year round managing the budget.  Board members prepare their budget requests (with guidance from the Executive Committee) in order for us to create the budget for the next fiscal year in June. That preparation along with reviewing portfolio budget proposals takes up a considerable chunk of the first half of the year. That was true of my first year in this role. We passed the budget in June, and that took care of most of my work in that area.

Most.

Every year we also go through a financial audit. This takes place after we have worked through last year’s and next year’s budgets. I had to work with the auditors to assess how we’re spending the money and managing it in general. We passed our audit with a full set of sails this year (that means we looked good).

Governance

This is a pretty broad topic. I like to joke that it means I’m the bouncer, but that’s not true. Basically, I’m responsible for running our meetings, seeing that we’re reporting on everything appropriately, publishing the minutes, and abiding by our by-laws, etc. I work with the people at HQ to ensure there is an agenda for each of the monthly board meetings. I run those meetings.

All this was largely accomplished this year, although I wasn’t able to make the January meeting in person due to a conflict with work (I was in the UK during the meeting, I attended remotely in the evenings). Other than that, I did my job as defined.

Global Growth

Under the “shall also perform such duties as from time to time may be assigned to him/her by  the President” part of my job description in the by-laws, I was appointed to oversee the Global Growth Committee until June 2017 (actually, I practically begged for the job). We are a global organization with membership around the world. My goal was to ensure that every portfolio included our global growth priorities into their goals.  We’re about half way through the year, and many of the goals we set out to accomplish are in place. Others are in process.

Part of my responsibilities are to report to you, our members, on the financial health of the organization. While there are no specific methods required for that report, we do traditionally give a financial report at PASS Summit. I did that this year and shared how we are investing our money globally through the organization in support of all our global members. I was very excited about how much we actually had to report on this topic, all the growth we’ve had through all the different PASS events. We will absolutely do more in this area.

In part, this year has been about my own personal growth into this role. I wasn’t completely prepared for it going in. It’s actually kind of like being a parent. You think you’re prepared and then the rubber meets the road and you realize, ooh, this is harder than it looked. Same thing with the EVP role. Despite a couple of bumps though, I think I have a good handle on the role.

I’m Grateful For… 

We often talk about our #sqlfamily. I do love and appreciate mine. I want to call out a few of my #passfamily here and show some public appreciation for what they do to make this organization, and all that it does, possible.

It’s probably hard sometimes to see it, but Christianson & Company, the people who run PASS HQ, do a simply amazing job. I don’t think they get enough credit at all. In no particular order, and by naming names, I’m not excluding anyone, but there are few people that have done more for me personally so they deserve recognition, please don’t be upset if you don’t see your name here: Sandy, Janice, Georgia , Sonya, Simon, and especially Judy. I work with these people every week, sometimes every day, and I truly appreciate what they’ve added to this organization and my life.

I also want to thank Microsoft for all the support they’ve provided PASS. In fact, I truly appreciate all our various partners, especially the Global Alliance Partners. Without their support, the organization couldn’t supply all the service it provides to the community. Thank you. You make a difference.

My fellow Board members. You all rock. Period. Full stop.

The Exec. What can I say? I appreciate the faith you put in me. I hope you’re not disappointed.

I want to thank all the members of PASS. Most of all, I want to thank the critics. Those who have called us out, privately and publicly, thank you. You challenge us to be better and do more. I appreciate it. Keep it coming.

That’s it. Thanks for reading my ramblings.

Happy Yule!

Dec 12 2016

OPTIMIZE FOR Hints When Parameter Sniffing is Turned Off

While presenting recently and talking about dealing with bad Parameter Sniffing, I got the question; what happens to OPTIMIZE FOR hints when parameter sniffing is disabled?

This is my favorite kind of question because the answer is simple: I don’t know.

Parameter Sniffing

For those who don’t know, parameter sniffing is when SQL Server uses the precise values passed into a query as a parameter (this means stored procedures or prepared statements) to generate an execution plan from the statistics using the value from the parameter. Most of the time, parameter sniffing is either helping you, or is not hurting you. Sometimes, parameter sniffing turns bad and hurts you quite severely.

Usually, but not always, this is because you either have severely skewed data (some data is very different than the rest, lots of rows versus fewer or vice versa), or your statistics are out of date. The real issue is deciding how best to resolve the issue if you are dealing with bad parameter sniffing.

OPTIMIZE FOR

There are a bunch of ways to deal with parameter sniffing when it goes bad. All of them revolve around controlling what type of execution plan gets generated. The three most common methods for resolving bad parameter sniffing are, plan forcing using Query Store (currently available in Azure SQL Database and SQL Server 2016, I have written extensively on this topic), using a RECOMPILE hint to get a new plan every time, or, using the OPTIMIZE FOR hint to get a plan based on a specific value or on the average of values.

Using OPTIMIZE FOR is pretty straight forward. Here’s an example query that uses the OPTIMIZE FOR hint to force the optimizer to choose a particular execution plan:

CREATE PROC dbo.AddressByCity @City NVARCHAR(30)
AS
   SELECT a.AddressID,
      a.AddressLine1,
      a.AddressLine2,
      a.City,
      sp.Name AS StateProvinceName,
      a.PostalCode
   FROM Person.Address AS a
   JOIN Person.StateProvince AS sp
      ON a.StateProvinceID = sp.StateProvinceID
   WHERE a.City = @City 
OPTION(OPTIMIZE FOR(@City='Mentor'));

Any time this stored procedure is called and that query gets compiled, or even if the statement gets recompiled, it will use the value of ‘Mentor’ to look at the statistics and determine how many rows are likely to be returned. This lets me take control away from the parameter sniffing process.

Turning Off Parameter Sniffing Entirely

We can just completely eliminate parameter sniffing. There’s a traceflag that we can set:

DBCC TRACEON (4136,-1);

This doesn’t require a server reboot or anything. It’ll just stop using parameter sniffing… on the whole server. I strongly advocate against using this without very extensive testing to confirm that you’re not benefitting from parameter sniffing.

We can remove the hint from the query above. Let’s then turn off parameter sniffing. If I capture the execution plan and look at the SELECT properties, that’s where I would normally see the Compile Time values for parameters. However, as you can see here, I don’t have a Compile Time value, just a Run Time value:

NoSniffing

What Happens To The OPTIMIZE FOR Hint

Let’s recompile the stored procedure using the OPTIMIZE FOR hint so that we can see what happens. Then, I’m going to call the procedure, but I’m going to pass a value that would result in a different execution plan:

EXEC dbo.AddressByCity @City = N'London';

Now, let’s take a look at the properties:

Sniffing

You can see that I have both a compile time value, and a run time value.

Conclusion

Just because I have parameter sniffing disabled, the query hints do not stop working. If I used OPTIMIZE FOR UNKNOWN instead of OPTIMIZE FOR a value, the results would be the same as when I disable parameter sniffing (the optimizer just uses an average instead of specific values). However, the other hint still pushes the optimizer to use a particular value, effectively pushing it to still do parameter sniffing even though parameter sniffing has been disabled. One other point, I also tested using OPTIMIZE FOR through a Plan Guide even though parameter sniffing was disabled. This worked fine too.

Dec 05 2016

Presentations for SQL Server Beginners

#iwanttohelp

Tired from all those blog posts

For my final blog post in the #gettingstarted, #iwanttohelp series, I decided to curate a set of presentations from the PASS Virtual Chapters. This content is available online. It’s free. Most importantly for the person just getting started as a SQL Server data pro, it’s good. I’m going to marry each of the presentations with my eleven blog posts in this series.

  1. The Importance of a Full Backup in SQL Server
    For this one I’m going to recommend Tim Radney’s session Understanding SQL Server Backup and Restore. I know Tim personally and guarantee this is a good session.
  2. Why Is The Server Slow
    Jes Borland is a very close personal friend and an absolutely outstanding presenter (and person). She has a session all about getting you started on wait statistics Beyond CXPacket: Understanding Wait Statistics.
  3. SQL Server Backups Are a Business Decision
    Paul Randal is, well, Paul Randal. If you’re not familiar with his work, you really should be. He presented a session called Building the Right Backup Strategy that should coincide nicely with my blog post.
  4. A View Is Not A Table
    OK, I couldn’t find a single session talking about views, so I’m going to come at this from the side, just a little (and I might start working on a presentation on views & other structures). This session by Kenneth Ureña, who I’ve talked with several times, is about structures and performance. It should do the trick. Indexing Strategies and Good Physical Designs for Performance Tuning.
  5. Use The Correct Data Type
    Silent Killers Lurking in Your Schema by (yes, a good friend) Mickey Steuwe is absolutely going to cover data types, among other things.
  6. Choosing the Right SQL Server Edition
    John Martin (yeah, another friend, what can I say, I try to stay on the good side of smart & capable people) has a presentation that’s a bit more focused on SQL Server internals but will absolutely address the edition of SQL Server as part of the work. Get SQL Server Set Up Right! The First Time.
    I’m also going to add a session by Jes Borland, again, that introduces Azure, especially talking about Azure in a hybrid scenario. Azure and SQL Server. Plus, I just want to share more Azure links. If you’re just getting started with the Microsoft Data Platform, start with Azure.
  7. Monitor Query Performance
    There are tons of choices here. I’m going with Kicking and Screaming: Replacing Profiler With Extended Events because I know that most people learn Profiler first. I also know that we need to get people to start using Extended Events instead. I will further add that Erin Stellato really knows here stuff on this topic, as well as many others. She is also a wonderful presenter (and a friend).
  8. Azure SQL Database For Your First Database
    As I said, I really believe in Azure as an entry into the land of Data Platform. It’s easy. It’s inexpensive. It’s available almost anywhere. Tim Radney has a session called Azure SQL Database for the Production DBA that will help to get you started.
  9. The Clustered Index is Vital To Your Database Design
    The clustered index is so important that everyone presents on it in one fashion or another. I’ve met Ayman El-Ghazali a few times. We’re not friends yet, but we get along (he’s very smart, see above). His session Indexing Fundamentals seems ideal for this slot.
  10. Statistics Are Vital For Performance
    Erin Stellato has a session called Statistics Starters. It is so important to understand what statistics are within SQL Server, how to use them, how to maintain them, it’s hard to over-emphasize it.
  11. PowerShell To Test a Query
    If you’re just getting going with Powershell, you’re going to need tons of help. Luckily it’s out there. I could have picked any number of excellent sessions by amazing people (many of them friends). I’m going with the PowerShell Tips and Tricks for SQL Server Administration by Mike Fal. Mike (a friend) recently helped me with some of my own PowerShell scripts, so he’s kind of top of mind at the moment.

Conclusion

For my final post I wanted to do three things. First, a recap of the last year’s worth of #getttingstarted blog posts. Believe it or not, I put a little work into each of these and it’s nice to see them all collected like this. Second, I wanted to illustrate the vast quantity of high quality learning that is available through PASS and the PASS Virtual Chapters. I pulled these sessions from a bunch of different chapters. There are alternatives I could have picked for every topic (except for views, something to work on). Which leads to my third, and final reason for doing my last post in the series like this, I wanted to keep helping. You now know a great resource to go to and search through for more information and learning.

Before we go, I want launch a small challenge of my own. If you’re a blogger, pick a topic (or a collection of topics like I did), and put together a blog post that curates the content using only PASS resources. Link back to this blog. Use the hash tag, #PASScurated. Let’s see what other information can be put together.

 

 

Nov 28 2016

Database Clone

There are a bunch of ways you could create a database clone. Backup and restore is one method. Export/Import is another. There are even third party tools that will help with that. However, each of these has a problem. It’s moving all the data, not just once, but twice. You move the data when you export it and you move it again when import it (same thing with backup and restore). That makes these methods slow for larger databases. How can you create a database clone without moving the data multiple times?

Don’t Move the Data At All

New with SQL Server 2016, expanded in SP1, and added to SQL Server 2014 SP2 is a new command, DBCC CLONEDATABASE. This is like a dream come true. The use is extremely simple:

DBCC CLONEDATABASE(WideWorldImporters,CloneDB);

databaselistWhat happens is that it creates a new database on the server. In that database you get the full set of schema plus all the statistics and meta data. What meta data you might ask. Query Data Store information is moved as part of this. The database is created read only and the file size is whatever the value is on the Model database.

For example, my WideWorldImporters database is currently about 1gb worth of data. The new CloneDB that I created is 80mb in size. You can, as of SP1, control this even further by choosing to not move statistics or not move Query Store data.

And the Point Would Be?

Why would you just create an empty database with statistics and Query Store information? What is that going to do for you?

Obviously, without data, you’re not going to be troubleshooting data issues. However, what you have is a database that you can run performance tests against. Because it has the statistics and the meta data from your original database, this cloned database will behave as if it had data inside of it in terms of generating execution plans. You can run all kinds of queries and see how they would behave, without moving data around.

You’re excited now, right.

The database is read only so that you avoid a statistics update. If you were to take the database out of read only and then update the statistics, there’s no data. Therefore you would end up with statistics showing that there is no data. Any execution plans generated would not be reflective of the original database.

I’d recommend caution when using this. You may create the clone on production, but I wouldn’t suggest leaving it there or using it in tests there. Even Microsoft suggests caution. Here is the output from my clone operation:

Database cloning for ‘AdventureWorks2014’ has started with target as ‘QDSClone’.
Database cloning for ‘AdventureWorks2014’ has finished. Cloned database is ‘QDSClone’.
Database ‘QDSClone’ is a cloned database. A cloned database should be used for diagnostic purposes only and is not supported for use in a production environment.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

I’d probably back up that clone and restore it to a non-production server for any testing.

Only Move the Data Once

So, what if you really do want to have data. Is there a fast way to move it twice? Not really. However, what if you only move it once? Redgate SQL Clone is a new piece of software that does just that.

What happens is a copy is made of your database. It’s about the size of a backup and takes just as long. Then, using virtual drives, that is mounted to the server and it occurs pretty much instantly. However, the data doesn’t move at all. You’re reading from the copy of the database, not local storage. Further, more than one server can attach to the copy, so lots of servers can have a clone.

The main use for this would be for troubleshooting production data, but offline, away from the production server. If you cleaned the data and made a clone of that, you have another primary use as your development database. You can have a clean copy of production that only takes up the space of a single backup on multiple development servers. That’s a win.

Again, I wouldn’t recommend using this in a production environment either. Yes, the initial backup can be from production, but the mounted clones should not be.

This is a great way to get a clone of your database but only have to move the data a single time.

Conclusion

It is possible to get that copy of the database off your production server without having to move the data twice. If you’re only looking for performance tuning, check out DBCC CLONEDATABASE. If you also need data, take a look at SQL Clone from Redgate. Either way, it is possible to get a copy of your database faster and easier than moving the data two times.

Nov 23 2016

PowerShell to Test a Query

powershellSo you want to do some tuning, but you’re not sure how to test a query on it’s performance. Not a problem. Here’s a very rough script that I use to do some recent testing.

This script to test a query is post #11 of the #enterylevel #iwanttohelp effort started by Tim Ford (b|t). Read about it here.

The Script

The goal here is to load a bunch of parameter values from one table and then use those values to run a query to test it. To do this I connect up to my SQL Server instance, naturally. Then I retrieve the values I’m interested in. I set up the query I want to test. Finally a loop through the data set, calling the query once for each value.

[reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") | out-null
# Get the connection
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = 'Server=WIN-3SRG45GBF97\DOJO;Database=WideWorldImporters;trusted_connection=true'

# Retrieve test data
$BillToCustomerCmd = New-Object System.Data.SqlClient.SqlCommand
$BillToCustomerCmd.CommandText = "SELECT  DISTINCT i.BillToCustomerID
FROM Sales.Invoices as i;"
$BillToCustomerCmd.Connection = $SqlConnection
$SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter
$SqlAdapter.SelectCommand = $BillToCustomerCmd
$BillToCustomerList = New-Object System.Data.DataSet
$SqlAdapter.Fill($BillToCustomerList)

# Set up test query
$SQLCmd = New-Object System.Data.SqlClient.SqlCommand
$SQLCmd.Connection = $SqlConnection
$SQLCmd.CommandText = "DECLARE @sqlquery NVARCHAR(MAX);
SET @sqlquery
   = N'SELECT si.StockItemName,
   i.InvoiceDate,
   i.SalespersonPersonID
FROM Sales.Invoices AS i
JOIN Sales.InvoiceLines AS il
   ON il.InvoiceID = i.InvoiceID
JOIN Warehouse.StockItems AS si
   ON si.StockItemID = il.StockItemID
WHERE i.BillToCustomerID = @BillToCustomerID;';

DECLARE @parms NVARCHAR(MAX);
SET @parms = '@BillToCustomerID int';

EXEC sys.sp_executesql @stmt = @sqlquery,
   @params = @parms,
   @BillToCustomerID = @btc;"
$SQLCmd.Parameters.Add("@btc",[System.Data.SqlDbType]"Int")

# Run the tests
foreach($row in $BillToCustomerList.Tables[0])
{
    $SqlConnection.Open()
    $SQLCmd.Parameters["@btc"].Value = $row[0]    
    $SQLCmd.ExecuteNonQuery() | Out-Null
    $sqlconnection.Close()
    
}

I’m using ExecuteNonQuery here so I can ignore the result set because, in this case, I don’t care about it. I just want to be able to capture the query metrics (using Extended Events naturally). If I wanted the results to come back I could just use ExecuteQuery.

Some Explanation

This is a very simple and simplistic way to do testing. I’m not providing this as a mechanism for all your tests. I’m not suggesting this should be your primary testing tool. This is just a simple way to do some basic testing.

You can easily mix this up to get more realistic tests or add to the tests. Throw in a command to pull the query out of the cache after each call. Now you’ll see how the compile works. Change the order of the retrieved data to make it random. Toss in other queries. Run a set of other queries on a loop in a different PowerShell script to generate load. The sky is the limit once you start playing with this.

The reason I go to PowerShell for this instead of running all these commands as T-SQL through SSMS is because of the more direct control on behavior I get with PowerShell. The ability to ignore the result set is just one example.

Conclusion

If you really want to do load testing and evaluation, I’d suggest setting up Distributed Replay and putting it to work. I’ve used it very successfully for that kind of thorough and complete testing of a system. If you really just want to know how this one query is going to fare, the PowerShell script above will enable you to test a query through this basic test. Just remember to capture the metrics when you’re doing any kind of test so that you can compare the results.


Want to play some more with execution plans and query tuning? I’ll be doing an all day seminar on execution plans and query tuning before SQLSaturday Providence in Rhode Island, December 2016, therefore, if you’re interested, sign up here.

Nov 22 2016

SQL Server on Linux: Boring

As a concept, SQL Server on Linux is thrilling. Due to this, the OS doesn’t stand in the way of taking advantage of everything that SQL Server offers. Because we have Linux, we have opportunity for all sorts of new fun.

I first saw SQL Server running on Linux about six months ago. The install was shockingly easy. I saw a few command line commands run and it all looked good. I didn’t have the opportunity to set up my own full blown install then, but I do now.

My Installation

Today I did all the work necessary to get this thing running. I got a copy of Ubuntu Desktop and put it on VMWare Workstation in a virtual machine. That was a painless process. SIDE NOTE: It was a painless process because VMWare recognized the OS I was installing therefore was able to help me set it up. Full props to VMWare.

The SQL Server install from the terminal went very well. After a few typos on my part, done. I got the IP address of the machine, swapped back over to a VM running Windows that had SSMS installed. Then, I connected up and it all just…. worked…

Redgate SQL Compare connected to Linux

Suddenly though, it was boring. I tested a few Redgate tools (SQL Compare is right there above, connected to Linux) to be sure they worked. No major issues encountered. Great. However, now, it’s just another instance of SQL Server.

I guess I could start complaining that SQL Agent isn’t there… except I’m not that big a fan of SQL Agent, and I can schedule all sorts of stuff to run in Linux using the sqlcmd command line tool.

I find it actually thrilling and exciting that SQL Server on Linux is boring. SQL Server on Linux is an early community preview and it’s already dull. Consequently, the boredom, dullness, just the plain old ordinariness of it, means that Microsoft has done a good job.