Jul 02 2014

Statistics and Natively Compiled Procedures

Statistics are one of the single most important driving factors for the behavior of the query optimizer. The cardinality estimates stored within the statistics drive costing and costing drives the decision making of the optimizer. So, how does this work with the new SQL Server 2014 natively compiled procedures? Differently.

In-memory tables do not maintain their statistics automatically. Further, you can’t run DBCC SHOW_STATISTICS to get information about those statistics, so you can’t tell if they’re out of date or not or what the distribution of the data is within them. So, if I create some memory optimized tables, skip loading any data into them and then run this standard query:

SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = 'London';

The estimates in the execution plan for the query show some pretty wild values:

Estimates

That’s an estimated number of rows of 65,536 and an actual of zero in a table scan because I created my table without an index. If I recreate it with an index, but still no data, the estimates change and we have an index seek operation now:

EstimatesIndex

That’s suggesting that the optimizer thinks there are 256 rows. But there’s no data here. So, let’s load some data into the table. Here are the new estimates from the index seek operator:

estimateswdata

I haven’t yet updated the statistics, so the optimizer still thinks there are zero rows in the table, or at least, it has no statistics. Well, not true:

SELECT  s.name,
        s.auto_created,
        s.user_created,
        s.filter_definition,
        sc.column_id,
        c.name AS ColumnName
FROM    sys.stats AS s
        JOIN sys.stats_columns AS sc ON sc.stats_id = s.stats_id
AND sc.object_id = s.object_id
        JOIN sys.columns AS c ON c.column_id = sc.column_id
AND c.object_id = s.object_id
WHERE   s.object_id = OBJECT_ID('dbo.Address');

If I run this, I’ll see statistics on the table, including system generated statistics:

stats

One point, while statistics don’t update automatically, they clearly can still be created automatically. But I can’t run DBCC SHOW_STATISTICS to see what’s in there. So, let’s see what estimates look like in the natively compiled procedure. I’ll take the same query code above and convert it to a proc. Then, when I capture the estimated plan from the procedure (no actual plans allowed), the index seek operator shows these estimates:

estimatescompiled

Now, we have an accurate estimate on the number of rows this query is likely to return. Interesting. So, let’s drop the procedure, update the statistics and then rerun the query and the procedure. First the query. The estimates don’t change. I’m still seeing estimated values as 256 while the actual is 434. So, let’s free the procedure cache and try again:

EstimatesUpdated

Ah, there we go. The plan itself came out the same way, but we clearly have more accurate estimates now. On to the procedure. I’ll recreate it and then get the estimated plan. Here are the estimate values from the same index seek operation:

Estimatescompiledupdated

Oops. Still estimated 0 rows.

What’s all this mean? I’m not sure. The documentation from Microsoft in this area is sketchy. During the most recent 24 Hours of PASS, I was able to ask Microsoft about the impact of statistics on natively compiled plans. They suggested that it was not necessarily going to be the same as we see in standard queries. These tests make that fairly evident. Also, it looks like the default values of estimated rows for in-memory tables is different. If I create standard tables, empty, and run the same query against them, the estimated number of rows is what I expect, 1. But in the case of in-memory tables it’s 256 with an index and 65,536 without one (or at least that’s what I’m seeing). However, the estimates for the natively compiled procedure never changed in this test case, always at 0. This is hardly shocking, but it seems that different rules apply for in-memory tables and their statistics as well as natively compiled procedures and their consumption of those statistics. And, as Microsoft has changed the default estimated number of rows for table variables from 1 to 100 in SQL Server 2014, it seems we have another instance where they’re defaulting to an even higher value and one where the values seem to just disappear.

The behavior of statistics within in-memory tables is extremely interesting because you may see changing behavior with those tables as your queries get more complex and your data changes. It makes a very strong case for making sure that you update your statistics on a regular basis on these tables.

I’m taking this show on the road. If you want to get an all day class on query tuning, I’ve got a lot of opportunities coming up. I believe that Albany, on July 25th, is not yet sold out. You can register here. I’m teaching an all day session at SQL Connections in September in Las Vegas. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. I’m excited, and more than a little humbled, to get the opportunity to present an all day pre-conference seminar at the PASS Summit in Seattle in November. Go here to register.

Jun 17 2014

Natively Compiled Procedures and Bad Execution Plans

I’ve been exploring how natively compiled procedures are portrayed within execution plans. There have been two previous posts on the topic, the first discussing the differences in the first operator, the second discussing the differences everywhere else. Now, I’m really interested in generating bad execution plans. But, the interesting thing, I wasn’t able to, or, rather, I couldn’t see evidence of plans changing based on silly things I did to my queries and data. To start with, here’s a query:

CREATE PROC [dbo].[AddressDetails] @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

And this is a nearly identical query, but with some stupid stuff put in:

CREATE PROC [dbo].[BadAddressDetails] @City VARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

I’ve change the primary filter parameter value to a VARCHAR when the data is NVARCHAR. This difference is likely to lead to differences in an execution plan, although not necessarily. If I load my tables up and update my statistics, then create the procedures and run them both with the same parameter values, I should detect any differences, right? Here’s the resulting execution plan:

ActualPlan

It’s an identical plan for both queries. In fact, the only difference in the plan that I can find is a CAST in the Index Seek operator for the BadAddressDetails procedure, as expected. But, it didn’t prevent the plan… the plan, from showing any other difference. However, execution is something else entirely. And this is where things get a little strange. There are two ways to execute a procedure:

EXEC dbo.AddressDetails @City = 'London';
EXEC dbo.AddressDetails 'London';

Interestingly enough, the first one is considered to be the slow way of passing a parameter. The second one is the preferred mechanism for natively compiled procedures. Now, if I execute these two versions of calling the procedure, I actually see different performance. The first call, the slow one, will run, somewhere in the neighborhood of 342 µs. The other ran in about 255 µs. Granted, we’re only talking about ~100 µs, but we’re also talking a 25% speed increase, and that’s HUGE! But that’s not the weird bit. The weird bit was that when I ran the good and bad queries together, the slow call on the bad query was consistently faster than the slow call on the good query. The fast call reversed that trend. And, speaking of which, the bad query, with the CAST ran in about 356 µs or ~25% slower.

The execution plan really didn’t show any indication that this would be slower, which made me do the next thing I did. I updated my Address table so that all the values were equal to ‘London.’ Then, because statistics are not maintained on in-memory tables automatically, I updated the statistics:

UPDATE STATISTICS dbo.Address WITH FULLSCAN, NORECOMPUTE;

With the statistics up to date, I dropped and recreated the procedure (there is no recompile with natively compiled procedures, something to keep in mind… maybe, more in a second). So now, the selectivity on the index was 1. The most likely outcome, an index scan. Guess what happened? Nothing. The execution plan was the same. I then went nuts, I converted all my tables so that a horrific mishmash of data would be brought back instead of clean data sets and I put data conversions in and… nothing. Index Seeks and Nested Loops joins. Weirdness.

I’m actually unsure why this is happening. I’m going to do more experimenting with it to try to figure out what’s up. But, that lack of recompile, maybe it doesn’t matter if, regardless of data distribution, you’re going to get the same plan anyway. I’m really not positive that looking at the execution plan for natively compiled procedures does much of anything right now. However, these tests were a little bit subtle. I’ll load up more data, get a more complex query and then really mess around with the code to see what happens. I’ll post more of my experiments soon.

I promise not to experiment on you though when I’m teaching my all day query tuning seminars. There are a bunch coming up, so if you’re interested in learning more, here’s where to go.  Just a couple of days left before Louisville and I’m not sure if there’s room or not, but it’s happening on the 20th of June. Go here to register. Albany will be on July 25th, but we’re almost full there as well. You can register here. SQL Connections is a pretty cool event that takes place in September in Las Vegas. In addition to regular sessions I’ll be presenting an all-day session on query tuning on the Friday of the event. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event. Let’s get together and talk.

 

Jun 10 2014

Differences In Native Compiled Procedures Execution Plans

All the wonderful functionality that in-memory tables and natively compiled procedures provide in SQL Server 2014 is pretty cool. But, changes to core of the engine results in changes in things that we may have developed a level of comfort with. In my post last week I pointed out that you can’t see an actual execution plan for natively compiled procedures. There are more changes than just the type of execution plan available. There are also changes to the information available within the plans themselves. For example, I have a couple of stored procedures, one running in AdventureWorks2012 and one in an in-memory enabled database with a few copies of AdventureWorks tables:

--natively compiled
CREATE PROC dbo.AddressDetails @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

--standard
CREATE PROC dbo.AddressDetails @City NVARCHAR(30)
AS
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    Person.Address  AS a
                JOIN Person.StateProvince  AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN Person.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
GO

The execution plans are obviously a little bit different, one going against in-memory tables and indexes and the other going against standard ones. However, that’s not the point here. This is the point. One of the first things I always check when looking at a new execution plan is the first operator, the SELECT/INSERT/UPDATE/DELETE operator. Here it is from the estimated plan of the query against the standard tables:

StandardSelectProperties

All the juicy goodness of the details is on display including the Optimization Level and Reason for Early Termination, row estimates, etc. It’s a great overview of how the plan was put together by the optimizer, some of the choices made, useful information such as the parameters used, etc. It’s great. Here’s the same thing for the natively compiled procedure:

NativeSelectProperties

Uhm… where are all my wonderful details? I mean, honestly, everything is gone. All of it. Further, what’s left, I’m pretty sure, is nothing but a lie. Zero cost? No, but obviously not from the standard optimizer estimates either, so, effectively zero. I’m pretty sure Physical Operation is just there as an oversight. In short, this is a different game. Yes, you will still need to evaluate execution plans for natively compiled procedures, but we’re talking a whole different approach now. I mean, great googly moogly, there’s not parameter compile time values. Is that just ignored now? Are the days of bad parameter sniffing behind us, or, are the days of good parameter sniffing gone forever? And it’s not just the SELECT operator. Here are the properties for a Nested Loops operator. First the standard set:

StandardNestedLoops

And, the natively compiled procedure:

NativeNestedLoops

Now, except for the fact that everything is FREE, the differences here are easier to explain. Execution Mode is applicable to columnstore indexes, and none of those are available yet in in-memory storage, so I’m not shocked to see that property removed. Same for the others. But this complete lack of costing is going to make using execution plans, always a problematic proposition with only estimated values available for so many things, even harder. It might even make it so that all you really need to do is look at the graphical plan. Drilling down on the properties, until meaningful data starts to appear there, might be a waste of time for natively compiled procedures.

I’ll keep working on these. Next up, can you get a “bad” execution plan with a natively compiled procedure? We’ll find out.

Just a reminder that I’m taking this show on the road. I’m doing a number of all day seminars on query tuning at various events in multiple countries. Louisville has almost filled the room we have available on the 20th of June. Go here to register.  But don’t wait. I’m also going to be in Albany on July 25th, but we’re almost full there as well. You can register here. If you were thinking about attending SQL Connections in September in Las Vegas, In addition to regular sessions I’ll be doing a day on query tuning. Go here to register for this great event. In Belgium in October, I’ll be doing an all day session on execution plans at SQL Server Days. Go here to register for this event.

 

Jun 05 2014

Natively Compiled Procedures and Execution Plans

The combination of in-memory tables and natively compiled procedures in SQL Server 2014 makes for some seriously screaming fast performance. Add in all the cool functionality around optimistic locking, hash indexes and all the rest, and we’re talking about a fundamental shift in behavior. But… Ah, you knew that was coming. But, you can still write bad T-SQL or your statistics can get out of date or you can choose the wrong index, or any of the other standard problems that come up that can negatively impact all those lovely performance enhancements. Then what? Well, same as before, take a look at the execution plan to understand how the optimizer has resolved your queries. But… Yeah, another one. But, things are a little different with the natively compiled procedures and execution plans. I have a very simple little example in-memory database with just a few tables put up into memory and a straight forward procedure that I’ve natively compiled:

CREATE PROC dbo.AddressDetails @City NVARCHAR(30)
    WITH NATIVE_COMPILATION,
         SCHEMABINDING,
         EXECUTE AS OWNER
AS
    BEGIN ATOMIC
WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
        SELECT  a.AddressLine1,
                a.City,
                a.PostalCode,
                sp.Name AS StateProvinceName,
                cr.Name AS CountryName
        FROM    dbo.Address AS a
                JOIN dbo.StateProvince AS sp
                ON sp.StateProvinceID = a.StateProvinceID
                JOIN dbo.CountryRegion AS cr
                ON cr.CountryRegionCode = sp.CountryRegionCode
        WHERE   a.City = @City;
    END
GO

The fun thing is, even with these really small examples, the performance differences are staggering when compared to standard tables or just in-memory tables alone. Anyway, this is what the estimated plan for this procedure looks like:

ActualPlan

Looks like a pretty standard execution plan right? Here’s the actual plan:

 

 

No, don’t bother refreshing your browser, that’s just a blank couple of lines because, there is no actual plan. You’re not dealing with a standard query, remember. There are only a couple of reasons to get an actual plan. First, if you’re experiencing recompiles, you might want to see the plan that was ultimately executed. An actual plan will reflect this, as will a plan pulled from cache. Second, you want to see some of the run-time metrics, actual rows, actual executions, runtime parameter values. Well, the first is not an issue since you’re not going to see these things recompile. It’s a DLL. The second could be an issue. I’d like to see actual versus estimated to understand how the optimizer made it’s choices. Regardless, the actual plan won’t generate in SSMS when you execute the natively compiled procedure.

There are some more differences between the plans for natively compiled procedures and standard procedures. I’ll go over a few more in another blog post.

Hey, if you do want to talk query tuning? I’m taking my one day seminar on the road to a bunch of different events. There’s still time to get to the event in Louisville on the 20th of June. That’s right before the SQL Saturday there. Go here to register. I’m also going to be putting this on the day before SQL Saturday Albany. You can register here. I’m very honored to have been selected to speak at SQL Connections in September in Las Vegas. This includes an all day seminar on query tuning. Go here to register for this great event. I’m also very excited to be able to say that I’m also going to be doing a different seminar in Belgium for SQL Server Days. I’ll be presenting an all day seminar on execution plans, including lots of details on SQL Server 2014. Go here to register for this event.

That’s four opportunities to get together and spend an entire day talking about query tuning, execution plans, statistics, the optimizer, extended events, oh, all sorts of things.

Apr 01 2014

SQL Server 2014 New Defaults

CelebratingToday, April 1st, 2014, marks the release of SQL Server 2014. There are tons and tons of great new methods and functions and processes within the product. We’re all going to be learning about them for quite a while to come. One of the most exciting though is one of the changes to the defaults. In the past there’s been a lot of debate around how best to configure your databases. What cost threshold should be set for parallelism, the max degree of parallelism, memory settings, auto growth, and all sorts of other settings affect how your databases work. But, Microsoft has finally done something smart. They’ve bowed to the pressure of hundreds and hundreds of DBAs, Database Developers and Developers around the world. They’ve finally done the one thing that will improve everyone’s code once and for all. Expect to see massive performance improvements in SQL Server 2014 thanks to this one default change.

What have they done you ask? What miracle is this that is going to result in both better code and better performance? Simple, by default, all connections to the database are now using the transaction isolation level of READ_UNCOMMITTED. In a single stroke, we no longer are forced to put with WITH NOLOCK on every single table reference in every single query. All the pain and suffering caused by blocks from locking has been removed from the product. We can look forward to a much cleaner code base and better query performance. Thanks Microsoft.

Please, note the date carefully.

Feb 12 2014

SQL Server 2014 and the New Cardinality Estimator

Cardinality, basically the number of rows being processed by an operation with the optimizer, is a calculation predicated on the statistics available for the columns in question. The statistics used are generally either the values from the histogram or the density. Prior to SQL Server 2014, and going all the way back to SQL Server 7.0 (in the Dark Ages when we had to walk uphill to our cubicles through 15 feet of snow battling Oracle DBAs and Fenris the whole way), there’s been one cardinality estimator (although you can modify the behavior somewhat with a traceflag in 2008R2 and 2012). Not any more. There’s a possibility for really complex, edge-case queries, that you may run into a regression from this.

You control whether or not you get the new cardinality estimator by setting the Compatibility Level of the database to SQL Server 2014 (120 for the picky amongst us). This could lead to regression issues. So, you’re going to pretty quickly want to know if your execution plan is using the new Cardinality Estimation Model, right? It’s tricky. Just look at the properties of the first operator in the plan (I told you to use that first operator). You’ll find one value there that will tell you what you need to know:

CardinalityEstimator

Just check this value (which you can also get from the XML behind the graphical plan) to see what calculations the optimizer used to arrive at the plan you’re observing.

Dec 19 2013

How to Set Up Managed Backups in SQL Server 2014

Earlier this week I introduced the concept of Managed Backups (and caused less of a turmoil than I thought I would). Now I want to show you how it works. It’s really simple and quite well documented. Before you get to the, insanely simple, task of actually enabling Managed Backup, you will need to go through the prerequisites. First, and this should be obvious, but I’ll state it, just in case, you need to set up an Azure storage account. That’s so insanely straight forward that I’m not going to say more. Then, you have to set up encryption on your system. I used these commands to prep it:

CREATE MASTER KEY ENCRYPTION BY PASSWORD =
    '$qlserver2012queryperformancetuning';

CREATE CERTIFICATE CloudDojoCert
WITH SUBJECT = 'Backup Encryption Certificate';

Again, shouldn’t have to say this, but covering my bases, don’t lose that password. You might need it later. You’ll also need to create a credential (I had one created for Backup to URL already):

CREATE CREDENTIAL MyCredentialName
WITH IDENTITY = 'MyStorageAccountName',
SECRET = 'MyAccessKey';

The actual command to get the backups started is crazy simple:

EXEC smart_admin.sp_set_db_backup @database_name = 'MyDatabase',
    @retention_days = 10, @credential_name = 'CloudDojo',
    @encryption_algorithm = 'AES_128', @encryptor_type = 'Certificate',
    @Encryptor_name = 'CloudDojoCert', @enable_backup = 1;

That’s it. By setting @enable_backup to 1, I’m and running. I can repeat the script for multiple databases and I’m done. Backups are now automatic. There’s auditing built in. For example, I seem to get errors on RESTORE HEADER_ONLY occasionally:

DECLARE @eventresult TABLE
    (
     event_type NVARCHAR(512),
     event NVARCHAR(512),
     timestamp DATETIME
    )

INSERT  INTO @eventresult
        EXEC smart_admin.sp_get_backup_diagnostics @begin_time = '12/1/2013',
            @end_time = '12/19/2013';

SELECT  *
FROM    @eventresult
WHERE   event_type LIKE '%admin%';

errors

[FileRetentionAdminXevent] Type = SqlError, Code = 3292, Stage = Undefined, Message = A failure occurred while attempting to execute Backup or Restore with a URL device specified. Consult the Windows Event Log for details.
RESTORE HEADERONLY is terminating abnormally., Additional Info = N/A.

Browsing through the files (using Cerebrata because it makes my life so much easier), I can see that it’s an intermittent problem, not a consistent one. And, since it’s just the verification of the backup, I’m not sweating it that much:

files

It’s the kind of intermittent error you’re likely to get when working with cloud-based technology. Code accordingly.

But that’s it. It’s so easy to get set up. You can see that I’ve been running backups for days. Managed Backups are easy to set up. They’re easy to turn on. You can use RESTORE FROM URL to pull these back into your database just as easily as I put them out there. There’s really nothing to it.

Again, this is not for everyone. It’s just another tool in the toolbox. Yet another way of getting things done. Explore it. Let me know what you think.

Dec 17 2013

Introducing Managed Backups in SQL Server 2014

Some of the new functionality of 2014 is straight forward, non-controversial and easily welcomed by the community. Think, updateable column store indexes. Some of the new functionality is going to raise an eyebrow or three (most of the time, not on one person, but you know the #sqlfamily, we have some interesting mutations represented). Think… managed backups.

Now, why on earth would a process that takes backups for you, including log backups, does it automatically per a schedule and/or when data has changed sufficiently, stores it offsite for protection and is easy to set up and maintain going to cause so much controversy? Hey, I’m wrong, it won’t, move along… Or, maybe, I’m right, and this is going to raise more than eyebrows.

We’re talking about surrendering control over your backup processes and letting Microsoft decide if and when you need a full backup or a log backup. We’re also talking about surrendering control over your storage and letting Microsoft manage your files in Azure Blob Storage. OMG! I said Azure. Several people vomited, just a little, into their mouths. Yep, Microsoft is going to take away your backups and your ability to control them. It will then store them in a way you can’t access and Microsoft will not let you have any control over them ever again…

Oh good gosh, really? You believe that? I do have a bridge for sale that I can let you have pretty cheap.

Let’s talk about what managed backup really is, who it might be for, where it could work extremely well and where it might not work well at all.

Managed backup is a service running on your system that is enabled, if you so desire, through T-SQL commands. That’s right, it’s not even turned on if you don’t want it to be. The service is pretty simple. When you enable it, you define a few parameters, which database or databases and a retention period. That’s pretty much it. From there it runs your first full backup on each of the databases you’ve put under management (no system databases and the backup size has to be less than 1tb, a limitation of Azure Blob Storage). Then, it runs log backups at least every two hours, or, after 5mb or more of transactions. Heck, if your log backup chain is broken, it’ll reset and start over. It’s smart. It’ll also run a full backup after 1gb of data growth, or once a week. And again, if you break the log backup chain, it’ll take a full and keep going. All the files go to Azure Blob Storage where you can use backup/restore from a URL to retrieve them. The backup format is, wait for it, SQL Server backups. Because they’re SQL Server backups you can restore them to a SQL Server machine. That’s it. It’s very simple and simplistic.

Who this is for, well, if you’re one of those people with the funny taste in your mouth right now, probably not you (although, you know I think you should start learning it). If you’re managing one of the larger database systems, over 1tb in size, you’re out. In fact, if you have a sophisticated backup routine in place now combining differentials and file/filegroup backups, you’re out as well. This is meant for those people who don’t want to or can’t take all the time and trouble to set up a sophisticated backup routine. It’s for people who are already running their servers on Azure VMs. It’s for small shops that don’t have the money to set up multiple off-site storage locations for their backups. It’s for consultants who want a really simple and reliable method for getting clients to have protected backups with off-site storage with a minimal number of steps. In short, there are a lot of people who are going to find this pretty handy. Again, not everyone. But that’s OK, every feature isn’t for everyone either. That mom & pop database that’s about 8gb in size and running on a 32bit server just isn’t going to benefit from in-memory table storage either. Does that negate that functionality? Nope. Nor should the fact that some large enterprise clients will never use this.

If you’ve never set up log backups and aren’t sure how often your system should have them, this is going to be pretty handy since it does it for you. If you don’t have access to a DBA… well, you may still need one because setting up is T-SQL only currently, no GUI, so it might not work well for everyone, but I’m sure someone will write an app or a script for it soon. If you’re already running your databases on Azure VMs, this thing is fast. Really fast. In fact, I think it works quicker than my local disks (the spinning ones, not the SSDs). And, the storage is right there in the cloud, so you won’t have to pay for transfer down to your machine (although plucking a weekly backup out of the blob storage might be a good idea).

Now, there are a few things I don’t like about it. First, I’d like to have the option to backup locally first and then copy to the cloud (that’s how Red Gate implemented their cloud backup solution and it works great). And, I’d like to be able to tweak the knobs on when the backups occur. I’m also still not convinced I like how the monitoring is done with it, although since the schedule is maintained by SQL Agent, you can set up alerts, secondary monitors, all that fun stuff, all on your own. While some sophisticated DBA types will absolutely use this, I think the prime audience is non-DBAs. As such, it needs a clean & extremely simple GUI. You’ll get more implementation of it. Heck, I’d say it ought to be a check box on the SQL Server install process, and one you have to uncheck to prevent getting it set up when you install the instance. That way, Next-Next-Next deployments that probably never even consider recovery models or any of the rest of that fun stuff, will get backups as part of their installation.

Overall though, I’m excited about the added functionality. I’m especially excited because this functionality is added. It’s not replacing anything. I can still do all my backups my way, completely and utterly. Or, I have another automation tool that I can put to work. I’ve been running this on a test platform for over a month with no issues (while running other tests for other new functionality, I’m finally digging into 2014). I’ll write another post on how to configure this, although I don’t really need to because the Books Online are pretty clear. But, I wanted to get the “They’re taking away my control and putting it all on Azure” freak out done before I posted the technical part of the info. Proceed with the meltdown.

Dec 11 2013

SQL Server 2014 Backup to URL

I’m absolutely in love with the concept of being able to create a backup directly to a protected, off-site location. Yeah, you can spend all sorts of money on terribly wonderful technology to enable that within your enterprise. And if you have that kind of money, great. But, what if you’re like most everyone else and you just want a little more protection without mortgaging the house? Let’s take a look at one possibility, backup to URL.

There have been ways to backup to hosted storage, whether it was DropBox, AWS or Azure blob storage, for quite a while. But, every method I tried out involved setting up some type of drive on your system. As soon as you had your K:\ drive mapped out to AWS or whatever, you could run a SQL Server backup directly to that location. Done. But, I’ve long wanted to just do it straight. Just to backup right to that location, no additional spoofing required. As long as you’re willing to keep your stuff within Azure blob storage, it’s now possible. Further, within SQL Server 2014 you can do it directly from T-SQL, no additional application required (there is an app for 2005, the 2008s and 2012). How is easy. Let’s talk about why for a minute.

Backups are not a technological problem. Oh, I know they can both literally and figuratively keep us nerds up at night, but the fundamental definition of the backup is something that the business must provide. They need to tell us how much data to risk, how long we can spend on recovery through things such as the Recovery Point Objective and Recovery Time Objective. I don’t want to get too deep in the weeds defining these terms. What I do want to point out is that your disk is at risk. No, not the one that’s storing your database (although, it’s at risk too). I’m talking about the one storing your backups. That thing is just waiting until the most opportune moment to die. And by opportune I mean of course, when it will be the hugest pain for you. Plus, are you in a flood zone? A wind zone? Earthquakes? Fire? Oh yeah, the gods and the universe well and truly hate your orderly little backup disk. So, it’s a good idea to have a second location for those backups. Back in the day, we paid a company to take our tapes to an off-site storage. Now, you can just go to a hosted storage site like, oh, I don’t know, Azure blob storage.

Here’s how it works. You have to have already set up an Azure account. You should also create a container to keep your backups in(if you need help dealing with blob storage, check out Cerebrata). While you’re there, make sure you get the URL for your storage location and the Access Key for your storage. Now, let’s go back to T-SQL. First, we need to create a CREDENTIAL. This gets us onto our URL without having to paste sensitive information into our backup processes:

CREATE CREDENTIAL MyCredentialName
WITH IDENTITY = 'MyStorageAccountName',
SECRET = 'MyAccessKey';

With that done, the rest is easy:

BACKUP DATABASE MyNewDB TO  
URL = N'http://myserver.blob.core.windows.net/scarybu/MyNewDB.bak' 
WITH  CREDENTIAL = N'MyCredentialName', 
NAME = N'MyNewDB-Full Database Backup', STATS = 10;

The syntax is pretty standard. You’re going to BACKUP a DATABASE named MyNewDB to your URL. In the URL, make sure you supply, not just the URL to your blob storage account, but the container and the backup file name. My container is ‘scarybu’ and the file name is ‘MyNewDB.bak.’ In the WITH clause you’ll specify the CREDENTIAL that you created earlier.

Yes. It’s that easy. Restore is the same thing only in reverse.

Now you can protect your backups by ensuring that you also put a copy to an offsite location and you can do it directly from with your area of expertise and control, SQL Server and Management Studio. For more details refer back to the Books Online.

Aug 07 2013

Be Cautious When Critizing About Guidance

I recently posted some comments about some guidance offered by Microsoft when talking about the CXPACKET wait type. Nothing I said was wrong, no. But, evidently there are a few whip smart and agile people who work for Microsoft. Specifically, Rick Byham, who works for the team tasked with taking care of the SQL Server Books Online. Why am I calling Mr. Byham smart and agile. Evidently he saw the blog post and has completely updated the description for CXPACKET at MSDN:

Occurs with parallel query plans when trying to synchronize the query processor exchange iterator. If waiting is excessive and cannot be reduced by tuning the query (such as adding indexes), consider adjusting the cost threshold for parallelism or lowering the degree of parallelism.

WHOOP! That’s awesome work. Thanks Mr. Byham.

Of course, one of my best and most popular recent blog posts is now completely incomprehensible if people follow the link to MSDN, but I can live with that. Well done.

And the caution in this case? You never know who is going to read this stuff, so try to be nice when offering criticisms. I could be a little more respectful with my criticisms, especially since I’ve put my full share of mistakes and weak advice out in books, articles and on this blog. My apologies if my flippant approach ruffled too many feathers. I got two lessons out of one blog post.