Category: Azure

Dec 03 2012

HDInsight, Finally

See this:

That’s right. The install worked. All I had to do was get a completely clean server set up. No domain controller. Now to get my learn on.

Microsoft has a web site with a number of introductory samples. I’ll start there and work through them. The very first example gets me set up with some data that it builds by running a Powershell script, importdata.ps1. But I’m not going to just blindly follow along. I want to see what the heck is happening so I can start understanding this stuff. By the way, thank you Microsoft for making the samples in PowerShell and not forcing me to relearn Python or something else. That would have been frustrating.

The script is really simple. It has two scenarios you can pass it, w3c or RunTraffic. They just change directory and run another PowerShell script, import.ps1, from two different directories. I’ll be the scripts are different. I’m running the w3c scenario, so let’s see what that script is doing.

Ah, now things are getting interesting. There are two functions, one for data generation which uses an executable to make up test data. The other a mechanism for calling Hadoop. Basically it uses two objects, System.Diagnostics.ProcessStartInfo and System.Diagnostics.Process. The ProcessStartInfo is for defining startup information for a process that you then define using the Process command. In this case it’s setting the location of hadoop:

$pinfo.FileName = $env:HADOOP_HOME + "\bin\hadoop.cmd";

Then it sets up arguments, if any. The actual calls to this from the code use a command, dfs, which has different settings -mkdir and -copyFromLocal. From what I can tell, it’s creating a storage location within Hadoop and then moving the data generated over. I’m good with all the scripts I can see except knowing where this dfs thing comes from.

Data load ran just fine:

Data loaded, time to test out a Map/Reduce job. Again there’s a powershell script included for running a simple job, so I check it out. First run, fails. Great. More stuff to try to troubleshoot in order to be able to see this work. This is not going to be easy.

Stepping through and running the scripts might not be the best way to learn this. So, I’m going to now start reading the Big Data Jumpstart Guide. I’ll post more as I learn it.

 

Nov 28 2012

HDInsight: First Pass

I installed HDInsight with no errors. The install is so brain-dead, I couldn’t see how there could be errors. But, I decided to fire up the service to start playing with it. That’s when I hit the errors. First up, I was getting logon errors for the Hadoop service itself. I tried resolving them manually, but was digging a hole. So, I uninstalled, and reinstalled, figuring I had missed an error message somewhere along the line. After this install, I didn’t get logon errors. Instead I got internal errors. Some more investigation and it appears that the service was marked for deletion. The uninstall worked, but somehow wasn’t complete. Yay! So, I did another uninstall after rebooting the server. Install #4 coming up.  And we’re back to logon errors. Specifically:

Starting IsotopeJS services
Starting isotopejs
System error 1069 has occurred.

The service did not start due to a logon failure.

Let’s check the logs, if any. Nothing in the c:\Hadoop directory. Checking the system logs to see if there’s anything there. Nothing I can see. No joy. I’ll go through it all again tonight.

Aug 20 2012

Another Execution Plan… In the Cloud!

A couple of weeks ago I posted about how different execution plans look when using the Azure Management Portal. I promised to look at a more complex plan and here we are. The query looks like this:

SELECT	*
FROM	dbo.Agent AS a
		JOIN dbo.Resource AS r
		ON a.AgentId = r.AgentId
		JOIN dbo.MovieStageResource AS msr
		ON r.ResourceId = msr.ResourceId
		JOIN dbo.MovieStage AS ms
		ON msr.Movieid = ms.MovieID
		   AND msr.MovieStageDefinitionId = ms.MovieStageDefinitionId
		JOIN dbo.Movie AS m
		ON ms.MovieID = m.MovieId
WHERE	a.AgentId = 42;

I didn’t go nuts in creating a crazy complex query. I just wanted to see a standard set of different operators to see how things changed. As before, I’m working off a controlled database that has identical structures and data in both the Azure database and my local copy of SQL Server. This way, the likelihood of identical plans is increased. Plus, if there are differences, it’s coming down to the differences in compilers between the two platforms, which makes things more interesting.

Running the query on my local server, the execution plan looks like this:

 

If you click on it to blow it up, you’ll see a pretty standard plan. If we read it in logical fashion (left-to-right, making Rob proud) you’ll see that a nested loop joins data between the Agent table and another nested loop. That loop joins between yet another nested loop and the MovieStage table. That loop is between one more nested loop and the Movie table. The final loop is between the MovieStageResource table and the Resource table. Nothing shocking or weird.

Same query, different (?) execution plan from Azure

First thing that jumps out at you are the red operators. We’ll get to those in a minute. So the plan looks totally different, right? No, not really. It’s pretty much the same. The estimated operation costs are different, which is interesting, but the basic plan is the same.

First thing to check in every execution plan? The properties in the first operator. Here they are from SQL Server:

Zoom in to read this if you need to. It’s a Full optmization, but the reason for early termination is a TimeOut. This means we can’t really trust this plan. On another day you may even see the plan change with the same query. Certainly if a service pack came out or something else the plan could change. However, this query looks like it’s properly optimized, even though the optimizer says it isn’t. Let’s see the properties from Azure. I’ve zoomed in to the “More” properties to get at some details:

The first point to note is the Estimated Rows value. Here, in Azure, it is 40.369. Meanwhile, back in SQL Server it is 40.8236. Now these differences are pretty subtle. I sure wouldn’t be sweating out of data statistics because of a .4546 discrepancy, but it gives us a clear indication why the estimated values for identical operations against identical data and structures are different. I’ll also note that the Optimization Level on the Azure query was also Full and the Reason for Early Termination was a timeout. These are fun and interesting results.

Now, let’s look at the most costly operator, the Clustered Index Seek operation. First the properties in SQL Server:

This is where things get a little weird. We have an estimated number of executions at 1000 and an estimated number of rows at 1. The actual number of rows was 22. The estimated cost for the operator is 2.24226. All well and good. But when we look at the same values in the Azure operator:

If you look at the Estimated Executions, Actual Rows, Average Row Size, Estimated I/O Cost and Estimated CPU Cost, they’re identical (with some rounding differences). But… the estimated cost of the operation is different; 2.24226 in SQL Server and 1.50594 in Azure. How do I account for the difference? I don’t. I think we’re looking at differences in how the optimizer establishes costs, because everything else looks the same.

Clearly, while the optimizers behave in a very similar manner, after all, we got effectively identical execution plans, they are not the same. The small difference in the cost assigned by the optimizer informs us of this. I’ll have to play around some more with actually complex queries to see what else can be seen.

If you enjoy this kind of drill down on queries, query tuning, execution plans and all that fun stuff, I’d like to suggest you sign up for my full-day pre-conference seminar at the PASS Summit 2012 this fall.

Aug 01 2012

Execution Plans … In the Cloud!

If you’re moving to a fully-fledged SQL database as part of Azure you may never even touch SQL Server Management Studio (SSMS). Instead you may do all the management of your SQL database through the Management Portal. You can actually do everything you need to through this tool. This includes writing and testing T-SQL statements. Believe it or not, it includes execution plans. But, if you’re used to seeing this from a query:

image

Then you may be a little bit surprised at this:

image

Believe it or not, those are identical plans. Yeah, the graphics are different, radically so. But the plans, those are the same.

I have a small test database that I have loaded into my local server and onto my Azure SQL Server database. It has identical structures and, thanks to SQL Data Generator, identical data sets. This means that when I run a query, assuming the statistics are the same, I should get identical plans.

For this post, I’m just going to scratch the surface and take a look at plans for this query:

SELECT * FROM dbo.Movie AS m
WHERE m.MovieId = 4138;

As you can see, both plans include the SELECT operator and a Clustered Index Seek operator. But as I’m constantly saying, just looking at the graphical part of the plan is not enough. You have to look at the properties. Let’s start with the SELECT operator. Here’s the properties from SSMS:

image

For comparison purposes, here’s what we get from the Management Portal. I’m going to post it up in pieces so that we can see it all.

When you click on an operator within the portal you’ll see a small set of properties, almost like the tool tip in SSMS. There’s a button there also “View More.” Clicking on that expands out the full set of properties. Here’s the first screen shot:

image

Obviously the order is different. But there are also some data differences. For example, the NodeID. In a plan in SSMS the nodes are numbered in a physical processing order, starting at zero. It’s one way to tell how the query was processed. In the Management Portal you can see that the first node has an ID of 1. The second one is 2. This follows a logical processing order. Also, the SELECT operator is given a NodeID. It does not have one, even in the XML, within SSMS. The other information is pretty standard and exists in both. I’ll keep scrolling down:

image

Here there are more differences. At the top, Statement Id. This is not displayed within the properties of an SSMS plan, but is available within the XML. It just shows which statement you’re working with if you have multiple T-SQL statements. Other pieces are the same, including exactly the same Subtree cost (which means the statistics are identical and the cost is estimated by the optimizer in both Azure and my SQL Server 2012 instance in the same way). Both plans were TRIVIAL and used simple parameterization.The Optimizer Hardware Dependent Properties show differences, but that’s to be expected. However, it does give you a glimpse at what is exposed to SQL Server within the Azure system. For example, the max degree of parallelism for this system is 3. I’ve got an EstimatedAvailableMemoryGrant of 4500062. Compare that with the 208838 on my 16gb laptop and you can see how much bigger the hardware is for the Azure system. Let me scroll down a little more:

image

The last major difference is more visible in this screen capture than the last one. The “Reason for NonParallel Plan” is not visible in the 2012 properties. It’s also not in the XML. This is new information exposed about this query and the execution plan generated.

With this one operator we can see that these plans, while similar, are not identical. I’ll post more about these in the coming weeks.

For more details on query tuning and execution plans, including Azure, please attend my pre-conference seminar at the PASS Summit 2012.

May 23 2011

SQL Azure Query Tuning

SQL Azure is still SQL Server at the end of the day. This means it is entirely possible to write queries against SQL Azure that really… what’s a good word… stink. So what do you do? It’s all in the cloud. You couldn’t possibly tune the queries, right? Wrong. Many of the same tools that you have available to you, such as execution plans and dynamic management objects, are still available in SQL Azure.

Let’s talk DMOs for a second. First off, don’t make the mistake I did of trying to run these outside the context of a specific database on SQL Azure. You’ll get extremely inconsistent results, trust me on this. Anyway, I did a quick run-down on some of the most used DMOs for performance tuning, the sys.dm_exec_* set. Here’s a complete listing of those procs and whether or not they’re available to you in SQL Azure:

SELECT  *
FROM    sys.dm_exec_requests AS der
--available

SELECT  *
FROM    sys.dm_exec_requests AS der
        CROSS APPLY sys.dm_exec_query_plan(der.plan_handle)
--available

SELECT  *
FROM    sys.dm_exec_requests der
        CROSS APPLY sys.dm_exec_sql_text(der.sql_handle)
--available

SELECT  *
FROM    sys.dm_exec_query_stats AS deqs
--available

SELECT  *
FROM    sys.dm_exec_cached_plans AS decp
--invalid object

SELECT  *
FROM    sys.dm_exec_connections AS dec
--available

SELECT  *
FROM    sys.dm_exec_cursors AS dec
--invalid object

SELECT  *
FROM    sys.dm_exec_requests AS der
        CROSS APPLY sys.dm_exec_plan_attributes(der.plan_handle) AS depa
--invalid object

SELECT  *
FROM    sys.dm_exec_procedure_stats AS deps
--invalid object

SELECT  *
FROM    sys.dm_exec_query_memory_grants AS deqmg
--invalid object

SELECT  *
FROM    sys.dm_exec_query_optimizer_info AS deqoi
--invalid object

SELECT  *
FROM    sys.dm_exec_query_resource_semaphores AS deqrs
--invalid object

SELECT  *
FROM    sys.dm_exec_sessions AS des
--available

SELECT  *
FROM    sys.dm_exec_requests AS der
        CROSS APPLY sys.dm_exec_text_query_plan(der.plan_handle, 0, -1) AS detqp
--available

SELECT  *
FROM    sys.dm_exec_trigger_stats AS dets
--invalid object

SELECT  *
FROM    sys.dm_exec_xml_handles(@@SPID)
--invalid object

The good news, most everything you need is available so you’re really going to be able to go to town on using DMOs as part of your query tuning. The bad news, “most everything” doesn’t include sys.dm_exec_query_optimizer_info is not on the list. This does take away a tool. It’s not a vital tool, but it’s one that allows you to understand some of what’s happening on the system. I’m not going to cry and rend my clothing because I don’t have it, but I will be somewhat disappointed.

This is great news! The tools you’re learning and using (and you are using DMOs, right?) will continue to be available in SQL Azure.

May 09 2011

SQL Azure Migration Wizard

There’s a project over on code plex to come up with a mechanism for validating databases and generating the necessary scripts to allow you to move those databases, and data, over to SQL Azure. It’s called the SQL Azure Migration Wizard. There’s no real install yet since it’s just a beta. You can download the executable and run it.

It’s pretty straight forward stuff. It’s primarily focused on validating that there is a database, script or series of calls (from a trace event file) that will be properly compatible with SQL Azure, but it can also generate a deployment script from the database. Here’s the opening screen:

MigrationWizard_MainScreen

Not much to it. I’ll walk you through the Analyze and Migrate path which covers most of the functionality of the app. Clearly, you click Next from this screen. This will open a connection window that recommends you connect to Master and let the app bring back a list of databases. OK. Fine. Here’s the list that comes back from my server after a standard connection:

MigrationWizard_SelectSource

Select a particular database and move on. I chose a copy of my MovieManagement database that I had created for my DAC pac testing. Clicking on Next opens up the Choose Objects window. You can either take the default and move everything or change the radio button and select individual objects. I selected everything here just to show off the screen:

MigrationWizard_ChooseObjects

If you click on the Advanced button near the bottom of the screen, you get a whole bunch of scripting options. I won’t go into them all, but you get a surprising level of control as to what gets created and how. The most interesting part there are all the compatibility checks. You can almost get a guess as to what Microsoft might be working on for upcoming releases if you assume that things that are included here can be switched on at a later date without compiling a new set of software:

MigrationWizard_Options

Clicking Next on the Choose Objects screen brings up a summary showing all the choices made so far.

MigrationWizard_Summary

Clicking next just generates the script. No really secrets. You can choose to save the script and the output from the process:

MigrationWizard_Script

Clicking next, things start to get interesting. You have to connect up to a SQL Azure database. You’ll forgive me if I don’t share my connection information with you. I’m trying to keep the costs on this silly thing near zero as much as possible. You again get a list of databases with a few interesting buttons on the bottom of the screen.

MigrationWizard_Target

I can change the connection to a new server. I can delete a database or I can create a database. Either way, the script I run is going to run, as is. There is no comparison, so except for your abilities to choose which objects get deployed, there is no incremental build, say, adding a column to a table. It’s all or nothing, drop & create or just create. Once you pick a database you get the choice to run your script and it finishes.

MigrationWizard_Finished

That’s it. Overall, I like this better than the DAC pac, at least for basic setup and operation. But there’s no way to script it, so you’re forced to walk through this wizard over & over again. However, that said, you have a mechanism that clearly offers more control and abilities than the DAC pac currently supports in SQL Azure.

But… I’m still underwhelmed. I’m still not getting my database into source control. I’m still not able to perform minor changes or in-place upgrades. This will move data as well as objects, which is good, but again, it’s all or nothing. No incremental deployments. We’re still not there with Microsoft tools, not yet.

May 02 2011

SQL Azure and the DAC Pac

WallWhen last we left our intrepid hero he had successfully deployed to SQL Azure using the Data-Tier Application Package, a DAC pac. It was easy and I had a database in place in nothing flat. There really weren’t any issues worth mentioning. I wasn’t crazy about the fact that unless I had Visual Studio 2010 I couldn’t edit the Data-Tier apps or get them into source control, but there you.

So, assuming this is a real production application, I’ve just realized that I need to get a new procedure into my database. If I just got and run the script to create the procedure then I’ll be breaking the link between my database and the DAC pac (and yes, I still enjoy saying that knowing that blood pressure is rising with each utterance), which would be bad. Instead, I need to create an updated DAC pac and then upgrade the existing deployment. There’s a bunch of information on how to do this right in the Books Online so it shouldn’t be too hard.

Then, I right click on my DAC pac on my SQL Azure database and here’s my menu:

DACPacMenuAzure

Hmm… But I’ve been testing DAC pacs on my local machine and when I right click on one of my tests there I get this menu:

DACPacMenu

What’s missing? And no, it’s not PowerShell for crying out loud. No Upgrade Data-tier Application menu choice on SQL Azure.

But, I thought I was supposed to deploy from the DAC pac? Well, yes, you’re supposed to deploy, but you shouldn’t need to upgrade, at least not on SQL Azure. And yes, all the words, especially the four letter ones, that just went through your head have been going through mine. You can’t deploy changes to a SQL Azure using a DAC pac.

But, Microsoft is aware of the issue with the DAC pac. I don’t know if they’re working on making it functional in SQL Azure, they may well be. But, they are working on the SQL Azure Migration Wizard. More on that in our next installment. Same Azure-Time. Same Azure-Channel.

Apr 28 2011

SQL University: SQL Azure & PowerShell

SQL-University-Shield-268x300Welcome once more to the Miskatonic branch of SQL University. I see that most off you survived out last encounter… uh, class. Most of you even appear somewhat sane… somewhat. Today we’re going to talk about PowerShell and it’s use with SQL Azure. Which sounds a something like an eldritch horror breaking in from some outer dimension… hmmm… that could explain certain things… So, back to Powershell and SQL Azure.

You can’t run Powershell from SQL Azure. Thank you for attending and enjoy your trip home.

Still here? I did try. Let’s clarify. You can’t run PowerShell from SQL Azure, but I didn’t say that you couldn’t use SQL Azure as a target for PowerShell. It is possible to connect to your SQL Azure databases running PowerShell from a local desktop or server. Let’s examine the basic connections.

The strength of PowerShell, at least in my mind, is in the pipe. The ability to pass information from one command to the next through the pipe is how PowerShell stands out from just another scripting language. SQL Server 2008 R2, and by extension, SQL Azure, has very limited pipe capability (we’re all holding out hope for Denali). What little you can do comes through the Invoke-Sqlcmd (and yes, invoking things at Miskatonic is usually frowned upon).

Connecting to a SQL Azure database from PowerShell is just a question of configuring the connection appropriately. But appropriately isn’t that hard:

Invoke-Sqlcmd -Query "SELECT * FROM sys.dm_exec_requests;" -ServerInstance "myAzureDB.database.windows.net" -Username "MyUserName" -Password "APassword"}

Yeah, it’s that easy. This will connect up and run the query. I’d show you the output, but it’s not all that thrilling.

The other way to use PowerShell is to connect to stuff through SMO. Clearly, if I can connect through Invoke-Sqlcmd, then making an SMO connection should be just as easy, and it is:


[reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") | out-null

$cn = "Server=tcp:myAzureDB.database.windows.net;Database=master;User ID=MyUserName;Password=APassword;Trusted_Connection=False;"
$conn = New-Object System.Data.SqlClient.SqlConnection
$conn.ConnectionString = $cn

$srv = New-Object ("Microsoft.SqlServer.Management.Smo.Server") $conn
$db = $srv.Databases["master"]

$db.ExecuteWithResults("SELECT * FROM sys.dm_exec_requests;")}

Well, OK, maybe not as easy, but it works. I’m not a fan of SMO.

The key to remember, any functionality that is not supported in SQL Azure, will not be supported through any PowerShell calls. But once you’re in, you’re in. You can begin making use of the abilities of PowerShell to perform automation on your SQL Azure database. Be aware though, you’re still paying for data movement charges. There’s nothing magic here. Also, since PowerShell doesn’t yet run on the SQL Azure server itself, remoting is right out.

Thanks for coming to my class. I hope you begin to employ PowerShell with your SQL Azure implementations. I also hope you all get home. The building’s beginning to shake. Since we’re no where near an active fault line, it usually means yet another visitation from beyond space and time, Why can’t they pick on Harvard occasionally, or go down to Princeton? In New Jersey, who’d notice? I’m heading for my bunker… uh, I mean office. See you next time, maybe.

Apr 26 2011

SQL Azure Diagnostic Tool

One of the CSS Engineers from Microsoft has released a diagnostic tool for SQL Azure. It’s worth a look. It’s just using standard DMO queries to pull back data, but they’re put them together inside a nice UI to help out. If you’re already a DMO super-star, this might not be useful, but if you’re looking for an easy way to gather data from your SQL Azure instance, this is a good choice. Just remember, although it wasn’t mentioned on the MS web site, I’m fairly certain you get charged for running these queries. It’s always something to keep in mind when working with SQL Azure.

Apr 25 2011

SQL Azure Deployments

You’ve set up your access to a SQL Azure site so you’re ready to go. Now how do you get your database up there? Well, TSQL works just fine, so you can script it all out to a file or set of files and apply them. Because you can connect with SSMS, you could use the GUI to build out your database. You can also use the Azure development platform and it’s web based GUI to create your database structures. Personally, the scripting method doesn’t seem too bad to me because you can get your code into source control that way. But, Microsoft has something additional in mind. It’s called Data-tier Applications or DAC for short (and yes, there are other things called DAC in SQL Server, what can I say, I don’t work for Microsoft).

A DAC is created by a DAC package, called a DAC pac (yes, yes, I know). A DAC pac can be created one of two ways. You can use Visual Studio 2010 to create a data-tier application, similar to how you would create a regular database in Visual Studio using the Team System database tools. The other option is to create a database using TSQL and/or SSMS and then extract a DAC definition. For this blog post, we’ll just talk about the method of extracting a DAC and creating a DAC pac (and yes, I’m having fun saying this over & over. Can you see a SQL Server MVP near by? His blood pressure is spiking every time he reads those letters).

I have a sample database I’ve been using for some different tests, it consists of a set of tables and primary & foreign keys. I sometimes load it up with data and other times leave it blank. Anyway, here is the list of tables in the database:

MovieManagementDB

To turn this into a DAC package is pretty easy. I’m simply going to right click on the database itself, click on the “Tasks” menu choice and then select “Extract Data Tier Application” This opens a wizard:

Extract_Step1

It’s pretty simple from there. Click on the “Next” button to bring up a few options that you can set. Not much really. You can name the package, give it a version, include a description and give it a location to store the package.

Extract_Step2

Clicking next validates that what you’re making into a DAC pac is a valid set of structures:

Extract_Step3

And then the final click of the Next button builds out the package and you’re done.

Extract_Step4

You would use this approach because it does a couple of things. First, it lets you develop stuff locally. You’re working on your own server using tools you’re comfortable with. Second, you can pass this to another team or another company and they can deploy your database from this package. The concept is, this is all you ever need to deploy a new structure to the database. It’s supposed to do upgrades and everything, all on it’s own. I’m working on testing it.

To deploy this to an Azure database, after you connect to the server in SSMS, open the Management folder. The only thing in it for a SQL Azure server is Data Tier applications. Right click and select “Deploy Data-tier application…” This will open another wizard and you can start by selecting the DAC pac you just created:

Deploy_Step1

You don’t really need to see all the screens because it’s just the other ones in reverse except you get to decide the name of the database you’re deploying. That’s it.

A couple of issues I have with this so far… no data. Microsoft is working on some other utilities for getting data out to the database, and I’ll cover those. No source control. Yeah, you can take the DAC pac itself, it’s just XML, and put that into source control, but that’s not the same thing as having individual tables, indexes and procs in source control as you can do using other tools.

Let me test these some more and I’ll post a few more bits of information about what I find.