I spend quite a bit of time writing about query tuning on this blog. I’ve written (re-written and am actively re-writing) books on query tuning. But what I like most is talking about query tuning. I love giving sessions at various events on different aspects of query tuning, but, what I like the most is spending a whole day, trying to do a complete brain dump to get as much information out there as possible. Sound attractive? Then I’ve got a great deal for you. Come to Louisville on June 20th, 2014. We will talk query tuning at length. You have a specific question? Let’s get it answered. Then, the next day, we can all go to SQL Saturday 286 there in Louisville to get more learning and some serious networking. What’s not to like?
Category: SQL Server 2014
I am terribly jazzed to be involved with this amazing event, SQL Intersection. It’s featuring some truly amazing speakers presenting on important topics. It’s being held here on the East Coast, right near the Mouse, the Duck and Dog. This is one of those conferences you need to get to. Check out the lineup. That is some of the smartest, most capable people I know. I’m quite humbled to be on the list with them, so I’ll do my level best to deliver good content. Look at the sessions. While I don’t know precisely when SQL Server 2014 is coming out, I’m sure it’s real soon, so this will be a great place to get a leg-up on understanding what this new set of technology offers, or just learn more about SQL Server in general, Azure, SSRS and SSIS.
Click here now to register for this special event.
In case you don’t know, this query:
UPDATE dbo.Test1 SET C2 = 2 WHERE C1 LIKE '%33%';
Will run quite a bit slower than this query:
UPDATE dbo.Test1 SET C2 = 1 WHERE C1 LIKE '333%';
Or this one:
UPDATE dbo.Test1 SET C2 = 1 WHERE C1 = '333';
That’s because the second two queries have arguments in the filter criteria that allow SQL Server to use the statistics in an index to look for specific matching values and then use the balanced tree, B-Tree, of the index to retrieve specific rows. The argument in the first query requires a full scan against the index because there is no way to know what values might match or any path through the index to simply retrieve them.
But, what if we do this:
UPDATE dbo.test1 SET C2 = CASE WHEN C1 LIKE '19%' THEN 3 WHEN C1 LIKE '25%' THEN 2 WHEN C1 LIKE '37%' THEN 1 END;
We’re avoiding that nasty wild card search, right? So the optimizer should just be able to immediately find those values and retrieve them… Whoa! Hold up there pardner. Let’s set up a full test:
IF (SELECT OBJECT_ID('Test1') ) IS NOT NULL DROP TABLE dbo.Test1; GO CREATE TABLE dbo.Test1 (C1 VARCHAR(50),C2 INT, C3 INT IDENTITY); SELECT TOP 1500 IDENTITY( INT,1,1 ) AS n INTO #Nums FROM Master.dbo.SysColumns sC1, Master.dbo.SysColumns sC2; INSERT INTO dbo.Test1 (C1,C2) SELECT n, n FROM #Nums; DROP TABLE #Nums; CREATE CLUSTERED INDEX i1 ON dbo.Test1 (C1) ; UPDATE dbo.test1 SET C2 = CASE WHEN C1 LIKE '%42%' THEN 3 WHEN C1 LIKE '%24%' THEN 2 WHEN C1 LIKE '%36%' THEN 1 END DBCC FREEPROCCACHE() UPDATE dbo.test1 SET C2 = CASE WHEN C1 LIKE '19%' THEN 33 WHEN C1 LIKE '25%' THEN 222 WHEN C1 LIKE '37%' THEN 11 WHEN C1 LIKE '22%' THEN 5 END
I added the extra CASE evaluation in the second query in order to get a different query hash value.
Here are the execution plans from the two queries:
They’re pretty identical. Well, except for me forcing a difference in the hash values, they’re identical except for the details in the Compute Scalar operator. So what’s going on? Shouldn’t that second query use the index to retrieve the values? After all, it avoided that nasty comparison operator, right? Well, yes, but… we introduced a function on the columns. What function you ask? The CASE statement itself.
This means you can’t use a CASE statement in this manner because it does result in bypassing the index and statistics in the same way as using functions against the columns do.
Cardinality, basically the number of rows being processed by an operation with the optimizer, is a calculation predicated on the statistics available for the columns in question. The statistics used are generally either the values from the histogram or the density. Prior to SQL Server 2014, and going all the way back to SQL Server 7.0 (in the Dark Ages when we had to walk uphill to our cubicles through 15 feet of snow battling Oracle DBAs and Fenris the whole way), there’s been one cardinality estimator (although you can modify the behavior somewhat with a traceflag in 2008R2 and 2012). Not any more. There’s a possibility for really complex, edge-case queries, that you may run into a regression from this.
You control whether or not you get the new cardinality estimator by setting the Compatibility Level of the database to SQL Server 2014 (120 for the picky amongst us). This could lead to regression issues. So, you’re going to pretty quickly want to know if your execution plan is using the new Cardinality Estimation Model, right? It’s tricky. Just look at the properties of the first operator in the plan (I told you to use that first operator). You’ll find one value there that will tell you what you need to know:
Just check this value (which you can also get from the XML behind the graphical plan) to see what calculations the optimizer used to arrive at the plan you’re observing.
This is an actual quote from what we can only assume is a functional human being:
The database is very big so we stopped taking backup’s.
Eight lords a leaping are you kidding me? Seriously! Seriously? By the Great Gu and all the Valkyries in Valhalla, you stopped taking backups of your PRODUCTION database because it was “very big.” And I’ll put down Brobdingnagian stacks of cash that “very big” in this case is probably 200-500gb or at worst 1-2tb. People, assuming you have enough brain stem intact to regulate breathing, you must know, you must by all the sparkly vampires in Twighlight KNOW that you need to have backups. Right? I mean, nothing ever goes wrong on this shiny marble we call Dirt, does it? No one would EVER just run a DELETE statement in production without a WHERE clause, would they you hairy bottomed tree climbing mouth breather? And I’m sure that, if it happened, you could just blow magic unicorn powder at the server in order to get the company’s billing list back, right? Because without that backup, you’re relying on the undivided attention, and total positive intent, of Odin and Freya to ensure that you never, oh, I don’t know, lose power causing the rocker arm on the disk to bash down repeatedly on the platter like Thor’s Hammer on an Ice Giant’s head, with similar results for your database. Until you have so much data that EMC hosts your company’s holiday party, for free, there is enough disk space, somewhere, to take a backup of your database.
Now, get out there and get it done. Don’t make me travel to each and every one of your places of work with the lead-weighted hickory learning bat to lay some education up side your beanie holder. Please, just take a backup.
Many years ago, I was working with a great DBA. Seriously, a very smart and capable guy. He told me, “We need to put the database into source control, just like app code.” And I just laughed. Not because I disagreed with him. I knew he was right, but I had tried, several times, to do just that. See, I’m not really a DBA. I’m a developer. I knew that code (and all the T-SQL that describes databases is code) needed to be versioned, sourced, tracked and audited. But great googly moogly, it was not an easy thing to do.
I first tried just exporting the entire database into a script and then occasionally checking that script into source control. Yay! Mission Accomplished… Well, I had a database in source control, yes, but I didn’t have any of the great stuff that went with it, most of all, a way to deploy from source control.
Next, I tried just storing the stuff that changed most, procedures. But, I had to store everything as an ALTER, or, I had to store it all as a DROP/CREATE and store the security settings and extended properties. I tried both. Neither satisfied and it was WAY too easy for someone else to modify a script the wrong way and bring the entire thing crashing down. And, not to mention the fact that any and all structural changes outside of stored procedures had to be built manually, or using a compare tool to generate them (but not the procs, cause we have those in source control, remember) by comparing prod & dev or qa & dev or something & something… Oh yeah, that was fun.
Man, it was painful back then. But now, there are several ways you can do this using Microsoft and/or 3rd party tools.
Why aren’t you?
Seriously, most of you aren’t. I’ve been going all over the country teaching a class on database deployments (next one is in Cleveland if you’re interested) and I know most people don’t put their databases into source control. Of course, I’m pretty sure most people don’t talk to their Dev teams if they can help it, and it does seem like most Dev teams seem to be on less than a perfectly chatty basis with their DBAs. is that the cause? The thing is, you are experiencing pain in terms of time spent, mistakes made, slower deployments, less frequent deployments, possibly even down time, all because you don’t do your deployments right. And deployments start from source control.
Developers have spent the last 30 years or so figuring out better and better ways to arrive at functional code in the hands of their customers (internal or external) as fast as possible, as accurately as possible. Over the same 30 years, DBAs have been figuring out how to better and better protect the information under our charge ensuring that it’s backed up, available, performing well, and always on (to use a phrase). My suggestion to you, data pro, talk to your developers. Figure out what they do and how they do it. Take advantage of their years and years of process improvement and apply what they’ve learned to your database development and deployment.
There’s a new concept growing out there. It’s fairly well established within the *nix communities, DevOps. It’s the realization that the world doesn’t begin and end with your database/server/application. Instead, your database/server/application is completely dependent on the database/server/application that it needs to run. Notice, no one is more important here. We’re talking about creating mechanisms for teams to deliver functionality to their clients (internal or external). And it all starts with the realization that there are parts of the process that some of us are better at than others. Developers know how to develop and deploy within teams. Let’s learn from them. And the start, well, that’s source control.
So, is your database under source control… for real. If not, get it there. The excuses I used to have are gone. That means the excuses you have now are probably gone too.
Fair warning, I may use the term DevOps more in the future.
Earlier this week I introduced the concept of Managed Backups (and caused less of a turmoil than I thought I would). Now I want to show you how it works. It’s really simple and quite well documented. Before you get to the, insanely simple, task of actually enabling Managed Backup, you will need to go through the prerequisites. First, and this should be obvious, but I’ll state it, just in case, you need to set up an Azure storage account. That’s so insanely straight forward that I’m not going to say more. Then, you have to set up encryption on your system. I used these commands to prep it:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '$qlserver2012queryperformancetuning'; CREATE CERTIFICATE CloudDojoCert WITH SUBJECT = 'Backup Encryption Certificate';
Again, shouldn’t have to say this, but covering my bases, don’t lose that password. You might need it later. You’ll also need to create a credential (I had one created for Backup to URL already):
CREATE CREDENTIAL MyCredentialName WITH IDENTITY = 'MyStorageAccountName', SECRET = 'MyAccessKey';
The actual command to get the backups started is crazy simple:
EXEC smart_admin.sp_set_db_backup @database_name = 'MyDatabase', @retention_days = 10, @credential_name = 'CloudDojo', @encryption_algorithm = 'AES_128', @encryptor_type = 'Certificate', @Encryptor_name = 'CloudDojoCert', @enable_backup = 1;
That’s it. By setting @enable_backup to 1, I’m and running. I can repeat the script for multiple databases and I’m done. Backups are now automatic. There’s auditing built in. For example, I seem to get errors on RESTORE HEADER_ONLY occasionally:
DECLARE @eventresult TABLE ( event_type NVARCHAR(512), event NVARCHAR(512), timestamp DATETIME ) INSERT INTO @eventresult EXEC smart_admin.sp_get_backup_diagnostics @begin_time = '12/1/2013', @end_time = '12/19/2013'; SELECT * FROM @eventresult WHERE event_type LIKE '%admin%';
[FileRetentionAdminXevent] Type = SqlError, Code = 3292, Stage = Undefined, Message = A failure occurred while attempting to execute Backup or Restore with a URL device specified. Consult the Windows Event Log for details.
RESTORE HEADERONLY is terminating abnormally., Additional Info = N/A.
Browsing through the files (using Cerebrata because it makes my life so much easier), I can see that it’s an intermittent problem, not a consistent one. And, since it’s just the verification of the backup, I’m not sweating it that much:
It’s the kind of intermittent error you’re likely to get when working with cloud-based technology. Code accordingly.
But that’s it. It’s so easy to get set up. You can see that I’ve been running backups for days. Managed Backups are easy to set up. They’re easy to turn on. You can use RESTORE FROM URL to pull these back into your database just as easily as I put them out there. There’s really nothing to it.
Again, this is not for everyone. It’s just another tool in the toolbox. Yet another way of getting things done. Explore it. Let me know what you think.
Some of the new functionality of 2014 is straight forward, non-controversial and easily welcomed by the community. Think, updateable column store indexes. Some of the new functionality is going to raise an eyebrow or three (most of the time, not on one person, but you know the #sqlfamily, we have some interesting mutations represented). Think… managed backups.
Now, why on earth would a process that takes backups for you, including log backups, does it automatically per a schedule and/or when data has changed sufficiently, stores it offsite for protection and is easy to set up and maintain going to cause so much controversy? Hey, I’m wrong, it won’t, move along… Or, maybe, I’m right, and this is going to raise more than eyebrows.
We’re talking about surrendering control over your backup processes and letting Microsoft decide if and when you need a full backup or a log backup. We’re also talking about surrendering control over your storage and letting Microsoft manage your files in Azure Blob Storage. OMG! I said Azure. Several people vomited, just a little, into their mouths. Yep, Microsoft is going to take away your backups and your ability to control them. It will then store them in a way you can’t access and Microsoft will not let you have any control over them ever again…
Oh good gosh, really? You believe that? I do have a bridge for sale that I can let you have pretty cheap.
Let’s talk about what managed backup really is, who it might be for, where it could work extremely well and where it might not work well at all.
Managed backup is a service running on your system that is enabled, if you so desire, through T-SQL commands. That’s right, it’s not even turned on if you don’t want it to be. The service is pretty simple. When you enable it, you define a few parameters, which database or databases and a retention period. That’s pretty much it. From there it runs your first full backup on each of the databases you’ve put under management (no system databases and the backup size has to be less than 1tb, a limitation of Azure Blob Storage). Then, it runs log backups at least every two hours, or, after 5mb or more of transactions. Heck, if your log backup chain is broken, it’ll reset and start over. It’s smart. It’ll also run a full backup after 1gb of data growth, or once a week. And again, if you break the log backup chain, it’ll take a full and keep going. All the files go to Azure Blob Storage where you can use backup/restore from a URL to retrieve them. The backup format is, wait for it, SQL Server backups. Because they’re SQL Server backups you can restore them to a SQL Server machine. That’s it. It’s very simple and simplistic.
Who this is for, well, if you’re one of those people with the funny taste in your mouth right now, probably not you (although, you know I think you should start learning it). If you’re managing one of the larger database systems, over 1tb in size, you’re out. In fact, if you have a sophisticated backup routine in place now combining differentials and file/filegroup backups, you’re out as well. This is meant for those people who don’t want to or can’t take all the time and trouble to set up a sophisticated backup routine. It’s for people who are already running their servers on Azure VMs. It’s for small shops that don’t have the money to set up multiple off-site storage locations for their backups. It’s for consultants who want a really simple and reliable method for getting clients to have protected backups with off-site storage with a minimal number of steps. In short, there are a lot of people who are going to find this pretty handy. Again, not everyone. But that’s OK, every feature isn’t for everyone either. That mom & pop database that’s about 8gb in size and running on a 32bit server just isn’t going to benefit from in-memory table storage either. Does that negate that functionality? Nope. Nor should the fact that some large enterprise clients will never use this.
If you’ve never set up log backups and aren’t sure how often your system should have them, this is going to be pretty handy since it does it for you. If you don’t have access to a DBA… well, you may still need one because setting up is T-SQL only currently, no GUI, so it might not work well for everyone, but I’m sure someone will write an app or a script for it soon. If you’re already running your databases on Azure VMs, this thing is fast. Really fast. In fact, I think it works quicker than my local disks (the spinning ones, not the SSDs). And, the storage is right there in the cloud, so you won’t have to pay for transfer down to your machine (although plucking a weekly backup out of the blob storage might be a good idea).
Now, there are a few things I don’t like about it. First, I’d like to have the option to backup locally first and then copy to the cloud (that’s how Red Gate implemented their cloud backup solution and it works great). And, I’d like to be able to tweak the knobs on when the backups occur. I’m also still not convinced I like how the monitoring is done with it, although since the schedule is maintained by SQL Agent, you can set up alerts, secondary monitors, all that fun stuff, all on your own. While some sophisticated DBA types will absolutely use this, I think the prime audience is non-DBAs. As such, it needs a clean & extremely simple GUI. You’ll get more implementation of it. Heck, I’d say it ought to be a check box on the SQL Server install process, and one you have to uncheck to prevent getting it set up when you install the instance. That way, Next-Next-Next deployments that probably never even consider recovery models or any of the rest of that fun stuff, will get backups as part of their installation.
Overall though, I’m excited about the added functionality. I’m especially excited because this functionality is added. It’s not replacing anything. I can still do all my backups my way, completely and utterly. Or, I have another automation tool that I can put to work. I’ve been running this on a test platform for over a month with no issues (while running other tests for other new functionality, I’m finally digging into 2014). I’ll write another post on how to configure this, although I don’t really need to because the Books Online are pretty clear. But, I wanted to get the “They’re taking away my control and putting it all on Azure” freak out done before I posted the technical part of the info. Proceed with the meltdown.
I’m absolutely in love with the concept of being able to create a backup directly to a protected, off-site location. Yeah, you can spend all sorts of money on terribly wonderful technology to enable that within your enterprise. And if you have that kind of money, great. But, what if you’re like most everyone else and you just want a little more protection without mortgaging the house? Let’s take a look at one possibility, backup to URL.
There have been ways to backup to hosted storage, whether it was DropBox, AWS or Azure blob storage, for quite a while. But, every method I tried out involved setting up some type of drive on your system. As soon as you had your K:\ drive mapped out to AWS or whatever, you could run a SQL Server backup directly to that location. Done. But, I’ve long wanted to just do it straight. Just to backup right to that location, no additional spoofing required. As long as you’re willing to keep your stuff within Azure blob storage, it’s now possible. Further, within SQL Server 2014 you can do it directly from T-SQL, no additional application required (there is an app for 2005, the 2008s and 2012). How is easy. Let’s talk about why for a minute.
Backups are not a technological problem. Oh, I know they can both literally and figuratively keep us nerds up at night, but the fundamental definition of the backup is something that the business must provide. They need to tell us how much data to risk, how long we can spend on recovery through things such as the Recovery Point Objective and Recovery Time Objective. I don’t want to get too deep in the weeds defining these terms. What I do want to point out is that your disk is at risk. No, not the one that’s storing your database (although, it’s at risk too). I’m talking about the one storing your backups. That thing is just waiting until the most opportune moment to die. And by opportune I mean of course, when it will be the hugest pain for you. Plus, are you in a flood zone? A wind zone? Earthquakes? Fire? Oh yeah, the gods and the universe well and truly hate your orderly little backup disk. So, it’s a good idea to have a second location for those backups. Back in the day, we paid a company to take our tapes to an off-site storage. Now, you can just go to a hosted storage site like, oh, I don’t know, Azure blob storage.
Here’s how it works. You have to have already set up an Azure account. You should also create a container to keep your backups in(if you need help dealing with blob storage, check out Cerebrata). While you’re there, make sure you get the URL for your storage location and the Access Key for your storage. Now, let’s go back to T-SQL. First, we need to create a CREDENTIAL. This gets us onto our URL without having to paste sensitive information into our backup processes:
CREATE CREDENTIAL MyCredentialName WITH IDENTITY = 'MyStorageAccountName', SECRET = 'MyAccessKey';
With that done, the rest is easy:
BACKUP DATABASE MyNewDB TO URL = N'http://myserver.blob.core.windows.net/scarybu/MyNewDB.bak' WITH CREDENTIAL = N'MyCredentialName', NAME = N'MyNewDB-Full Database Backup', STATS = 10;
The syntax is pretty standard. You’re going to BACKUP a DATABASE named MyNewDB to your URL. In the URL, make sure you supply, not just the URL to your blob storage account, but the container and the backup file name. My container is ‘scarybu’ and the file name is ‘MyNewDB.bak.’ In the WITH clause you’ll specify the CREDENTIAL that you created earlier.
Yes. It’s that easy. Restore is the same thing only in reverse.
Now you can protect your backups by ensuring that you also put a copy to an offsite location and you can do it directly from with your area of expertise and control, SQL Server and Management Studio. For more details refer back to the Books Online.
Let’s have some fun.
This Friday, November 1, 2013, I’m putting on an all day seminar on query tuning. It’s set up as a pre-conference event for SQL Saturday 255 in Dallas. It’s a 200 level course on understanding how the query optimizer works, the importance of statistics, constraints and indexes, how to read execution plans, and how to take all that knowledge and go to work on tuning your queries.
Here’s the fun. Sign up for the seminar, and bring a nasty query you’ve been trying to tune or a query you don’t understand or an execution plan that’s making you crazy. Depending on the time available near the end of the day, we’ll walk through a few of them. I’ve slightly restructured the seminar so I have some flexibility near the end to do this. It should be a blast. Don’t bring anything that contains sensitive data because I’m going to put it up on the big board in front of your peers. Also, don’t worry about blame. We all know Timmy wrote that query, not you.
There are only a couple of days left to sign up. If you are in the Dallas area and you want to learn query tuning, and maybe have a giggle along the way, please, click the link and register now.