Jul 25 2016

Monitor Query Performance

Blog post #7 in support of Tim Ford’s (b|t) #iwanttohelp, #entrylevel. Read about it here.

Sooner or later when you’re working with SQL Server, someone is going to complain that the server is slow. I already pointed out the first place you should look when this comes up. But what if they’re more precise? What if, you know, or at least suspect, you have a problem with a query? How do you get information about how queries are behaving in SQL Server?

Choices For Query Metrics

It’s not enough to know that you have a slow query or queries. You need to know exactly how slow they are. You must measure. You need to know how long they take to run and you need to know how many resources are used while they run. You need to know these numbers in order to be able to determine if, after you do something to try to help the query, you’ll know whether or not you’ve improved performance. To measure the performance of queries, you have a number of choices. Each choice has positives and negatives associated with them. I’m going to run through my preferred mechanisms for measuring query performance and outline why. I’ll also list some of the other mechanisms you have available and tell you why I don’t like them. Let’s get started.

Dynamic Management Views

Since SQL Server 2005, Dynamic Management Views (DMV) and Functions (DMF) have been available for access all sorts of information about the server. Specifically there are a few DMVs that are focused on queries and query performance. If you go back through my blog, you can find tons of examples where I illustrate their use. You can also see them at work in commercial tools and free tools. Adam Machanic’s sp_WhoIsActive, a free tool, makes extensive use of DMVs. To learn more about DMVs, you can download a free book, Performance Tuning with SQL Server Dynamic Management Views. DMVs are available in Azure SQL Database, Azure SQL Data Warehouse, and all editions of SQL Server.

The information captured by DMVs is an aggregation of all the times the query has been run. This means you can’t find how long the query ran at 3PM yesterday. You can though see the minimum and maximum time the query took as well as the average. The ability to see this information is what makes DMVs useful. However, another important point about DMVs is that they only collect information while a query is in memory. As soon as it leaves the cache (the area of memory it is stored in), so does all the aggregated information about the query in the DMVs.

You use the DMVs for a general understanding of how a query is behaving. They’re not meant for detailed for long term collection of information about queries. For that we use other tools.

Extended Events

Introduced in SQL Server 2008, Extended Events (ExEvents) are a mechanism for capturing detailed information about SQL Server and the processes within. One of those processes is how queries behave. I have multiple examples on this blog on using ExEvents. You can’t go wrong reading about them on Jonathan Kehayias’ blog. Extended events are available in Azure SQL Database and all editions of SQL Server.

When you need to know every query against a database, or each time a particular query is called, and all the details associated with the query (reads, writes, duration), ExEvents are the way to go. ExEvents are very lightweight on the server (but not free) and can be filtered so that you capture just the information you need. The information is detailed and not aggregated. Instead it’s raw. The real issue with capturing this data is the amount of data you’ll be capturing. Testing and careful filtering to ensure you’re dealing with too much information is important. Prior to SQL Server 2012, there was no graphical user interface for reading ExEvent data, so you would have been forced to run queries against the XML that the information is captured within. With the tools available in SQL Server Management Studio, this is no longer the case.

You use ExEvents when you need specific and detailed information about a query. ExEvents are not so good for generalized monitoring.

Query Store

Introduced in Azure SQL Database, and first released in SQL Server with 2016, Query Store is another mechanism for capturing aggregated information about queries. As before, I have examples on how to work with Query Store on my blog. You can also find quite a bit on it over at Simple-Talk. Query Store is pretty specialized still and only available in Azure and SQL Server 2016, but it is in all editions of SQL Server 2016.

Query Store captures information similar to what is available in the DMVs. However, unlike the DMVs, the information that Query Store captures is kept around, even after a query ages out or is removed from cache. This persistence makes Query Store very exciting. You do have to choose to turn it on for each database you wish to capture queries for. It’s not automatic like DMVs. The capture processes are asynchronous, so they should be relatively light weight for most databases.

You use the Query Store when you need to capture query metrics over the long term, but you don’t need detailed information and aggregations works well for you.

Others

There are other ways to measure query performance. You can use the Profiler GUI, but that actually seriously negatively impacts the server. You can bring a server down by using it, so it should be avoided. Profiler generates scripts called trace, which can be used to monitor your server. However, they have a much higher impact than ExEvents and they’re on the deprecation list. Microsoft is not added new trace events for new functionality, so they’re becoming less and less useful with each release. You also can’t use trace against Azure. If you’re writing a query and you just want to see how long it takes to run, you can use SET STATISTICS TIME ON, to capture the execution time. This is a handy way to quickly measure performance. There is also the ability to capture reads and writes using SET STATISTICS IO ON, but, while this does capture the metrics we need, it adds considerable overhead to the query, skewing performance measurement. This is why I stick to ExEvents when I need an accurate measure.

Conclusion

Honest people can disagree about the best way to capture query performance. I have my preferences as you can see. However, I’m fairly certain that everyone would agree that it’s important to know how to capture performance metrics in order to be able to assert that performance has increased or decreased in a measured fashion. You don’t want to guess at query performance, you want to know.

Jul 18 2016

Common Table Expression, Just a Name

The Common Table Expression (CTE) is a great tool in T-SQL. The CTE provides a mechanism to define a query that can be easily reused over and over within another query. The CTE also provides a mechanism for recursion which, though a little dangerous and overused, is extremely handy for certain types of queries. However, the CTE has a very unfortunate name. Over and over I’ve had to walk people back from the “Table” in Common Table Expression. The CTE is just a query. It’s not a table. It’s not providing a temporary storage space like a table variable or a temporary table. It’s just a query. Think of it more like a temporary view, which is also just a query.

Every time I explain this, there are people who don’t believe me. They point to the “Table” in the name, “See. Says so right there. It’s a table.”

It’s not and I can prove it. Let’s create a relatively simple CTE and use it in a query:

WITH    MyCTE
          AS (SELECT    c.CustomerName,
                        cc.CustomerCategoryName
              FROM      Sales.Customers AS c
              JOIN      Sales.CustomerCategories AS cc
              ON        cc.CustomerCategoryID = c.CustomerCategoryID
              WHERE     c.CustomerCategoryID = 4)
    SELECT  *
    FROM    MyCTE;

Now, I’m going to run the query within the CTE and the CTE together as two statements in a batch and capture the execution plans:

ExecPlans

On the top, the CTE, on the bottom, the query. You’ll note that the execution plans are identical. They each have the exact same Query Plan Hash value in the properties, 0x88EFD2B7C165E667, even though they have different Query Hash values, 0x192FFC125A08CC35 and 0xFEB7F2BCAC853CD5, respectively. Further, if I capture the query metrics using extended events, I get identical reads and, on average, identical execution times:

duration

This is because, there is no table being created. The data is not treated differently. A CTE is just a query, not some type of temporary storage.

Heck, let’s do one more thing. Let’s use the latest SSMS plan comparison tool and highlight one of the operators to see what differences there are internally in the plan:

 

plancompare

I don’t see a lot of differences. In fact, I don’t see any. That’s because the optimizer recognizes these two queries as identical. If it was loading data into temporary storage, you would see differences in something. We don’t. This is because, despite the somewhat unfortunate emphasis that gets placed on the Table portion of the name, the emphasis of the name, Common Table Expression, should be on the word Expression.

I will point out an interesting difference, especially useful for those who plug in CTEs everywhere, whether it’s needed or not. Let’s look at the properties of the two plans:

peroperties

You can see the similarities and differences that I pointed out earlier in the Statement, Query Hash and Query Plan Hash, as well as the Estimated Subtree Cost and others. What’s truly interesting is that the CompileCPU, CompileMemory and CompileTime for the CTE is higher than the regular query. While the CTE is just a query, it’s a query that adds a non-zero overhead when used, and therefore, should only be used where appropriate (good gosh, I’ve seen people put it EVERWHERE, on every single query, don’t do that).

Hopefully, this is enough to establish, truly, completely, and thoroughly, that the Common Table Expression is an expression, not a table.

Yeah, I did this before, but it keeps coming up, so I tried a different approach. Let’s see if the word gets out. Your Common Table Expression is not a table.


I love talking about execution plans and query tuning. I’ll be doing this at an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

Don’t  miss your chance to an all day training course on execution plans before SQL Saturday Oslo in September.

Jul 11 2016

Same Query, Different Servers, Different Performance. Now What?

Based on the number of times I see this question on forums, it must be occurring all the time. You have two different servers that, as far as you know, are identical in terms of their options and setup (although not necessarily in terms of power, think a test or pre-production system versus production). On these servers you have a database on each that, as far as you know, is the same as the other in terms of options, objects, maybe even data (although, this does mean that you have unmasked production information in your QA environment, which potentially means you’re going to jail, might want to address this, especially now that I’ve told you about it, mens rea, you’re welcome). On each database you run, as far as you know, the exact same query (whether a straight up ad hoc query, stored procedure, parameterized query, whatever). On your production server, let’s call it PROD, the query runs for 3 hours before it returns, but on the test server, let’s call it TEST, it returns in about 3 seconds.

What. The. Hell.

This is especially troubling because PROD has 8 processors that are much more powerful than TEST, which has only 2 (either can go parallel, so this is the same). TEST only has 16GB of RAM and PROD has 64GB. Further, you know that the databases, data, and statistics are the same because you restored the TEST database from PROD (although, see that jail thing above). However, you’re on top of things. You capture the query metrics so you know exactly the amount of time and the reads or writes from each query and you even do this using extended events so you’re minimizing observer effects. You also have monitoring in place, so you know that there is no blocking in PROD. The query is just a pig there, but not on TEST. You’ve even gone so far as to get an estimated execution plan from each server and they are absolutely different.

Now what?

Well obviously something somewhere is different. Start by comparing everything on both servers and both databases down to… hang on, here, we’ll write a PowerShell script and then….

Wait, wait, wait!

You have the execution plans? Before we start digging through all the properties everywhere and comparing everything to everything, updating statistics 14 times, and all the rest, what if we look at the execution plans. They’re different, so we should start looking at scans & indexes & statistics &….

Wait!

Let’s start simple and clear. Let’s take a look at the properties of the first operator:

AllProperties

This contains a ton of information about the settings of the systems where each plan was generated. Let’s assume that, probably, the databases are the same, as is the data and the statistics, but the problem is a system or database setting. These differences can result in different execution plans and therefore different execution times. If we are dealing with two different servers and we are fairly sure the data, structure and the statistics are the same, the properties of the first operator are a great starting point for understanding what went wrong.

Oh, and the first operator is this one with the red square around it:

first operator

Let’s use the new SSMS Compare Showplan to see the differences between our two execution plans that we captured:

properties compared

(clicking on that will make it bigger)

There are a bunch of differences highlighted, but one ought to jump out pretty quick. That’s right, these two databases have different compatibility levels which resulted in one of them using the new optimizer and one using the old optimizer as evidenced by the CardinalityEstimatorModelVersion. Fun point, both have identical query hash values. Not surprising, but additional, quick, validation that we really are talking about the same query on each server (and one of them isn’t using a trace flag to change the cardinality estimator). You now have enough information to go and make some changes to your system without lots of further fumbling and work.

When dealing with the same query from two servers that ought to result in similar behavior, but doesn’t, get the execution plans (estimated plans are fine here) and compare the the properties of the first operator. That’s the quickest way to identify the issues that could be leading to the differences between the servers.


Want to talk more about execution plans and query tuning? Let’s do it.

In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m also going to be in Oslo Norway for a pre-con before SQL Saturday Oslo in September.

Jul 05 2016

Carpenters vs. DBAs

9530113595_907e63b28a_mLet’s get the caveat out of the way up front, I work for a tool vendor.

If you look around at the tools landscape for the Microsoft Data Platform, it’s really interesting. There are a few tools vendors, primarily clustered around monitoring tools, and then there are a bunch of point tools for helping with various aspects of operations against the Data Platform (mostly SQL Server). Some of these are free tools. Some are pay only. Some are a mix. There are variables in the quality of these tools, and I’m sure not going to comment on that. Instead, I find one thing really interesting.

Let’s step back a bit. My neighbors have both worked as carpenters (well, one carpenter, and one general contractor who also does carpentry). They both go out of their way to ensure that their basic tool set is what they consider the best (want to start a fight, ask about hammers, it’s fun). They pay for these basic tools themselves. They also might pay for some of the smaller power tools themselves. The larger tools will be supplied at the work site.

Compare this with your average DBA or Database Developer. Some of the tools available for the Data Platform are clearly “work site” tools, especially the monitoring tools. These must be supplied by the organization (gods above & below, the last thing we want is each DBA or data pro to bring in a different monitoring tool). Then there are the “hand tools” of the Data Platform pro. Interestingly enough, many of these support a floating license such that you could purchase the tool and then “carry” it with you from job site to job site, like a carpenter. Instead, most every Data Pro I know will insist that the company has to supply them with these tools, or that the tools be free (although, you then see the company that won’t let you use free, but unapproved, tools on their site). They refuse to purchase any tool with their own cash. Even though having this tool could improve their work, maybe even make them look better than their peers.

I find this mind-set fascinating. It’s especially so because the average salary for a carpenter and the average for a DBA are somewhat far apart (40k to 70k). You’d think that making nearly double the amount of money, a DBA wouldn’t hesitate to purchase a piece of software that would make them better, as an individual. By and large though, you’d be wrong.

Yeah, I know what you’re thinking. This is just a pitch from a tools vendor. Well, duh. Yeah, it is. However, I also have purchased my own software in the past and it has helped me be better at my job than my peers. As a wise man once said, “If you ain’t cheatin’, you ain’t fightin'”. As much as our community is all about sharing, and I believe in sharing, there is still a competition when it comes to getting and keeping jobs. You’d think that you’d go for any advantage you can get. I sure have. That includes buying a piece of software that helps me do my job better, out of my own pocket. While I do want you to buy tools because I’m selfish and want to keep earning a pay check, I also think you should be purchasing tools so that you can become better at your job. Embrace the healing power of “And.”

Oh, why would you buy anything but a straight claw hammer?

Jun 28 2016

CASE Statement in GROUP BY

Set based operations means you should put everything into a single statement, right?

Well, not really. People seem to think that having two queries is really bad, so when faced with logical gaps, they just cram them into the query they have. This is partly because SQL Server and T-SQL supports letting you do this, and it’s partly because it looks like a logical extension of code reuse to arrive at a query structure that supports multiple logic chains. However, let’s explore what happens when you do this on particular situation, a CASE statement in a GROUP BY clause.

You see this a lot because a given set of data may be needed in slightly different context by different groups within the company. Like many of my example queries, this could be better written. Like many of my example queries, it mirrors what I see in the wild (and for those following along at home, I’m using the WideWorldImporters database for tests now):

CREATE PROCEDURE dbo.InvoiceGrouping (@x INT)
AS
SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY CASE WHEN @x = 7 THEN i.ContactPersonID
              WHEN @x = 15 THEN i.AccountsPersonID
              ELSE i.SalespersonPersonID
         END;
GO

Running this for any given value above, 7, 15 or other, you’ll get the same execution plan, regardless of the column used in the GROUP BY. However, Parameter Sniffing is still something of a factor. When you group this data by SalesPersonID, you only get 10 rows back. This will be shown as the estimated number of rows returned if some value other than 7 or 15 is used as a parameter. However, this is always the plan:

coreplan

You can click on that to expand it into something readable. We can eliminate the Parameter Sniffing from the equation if we want to by modifying the query thus:

CREATE PROCEDURE dbo.InvoiceGrouping_NoSniff (@x INT)
AS
DECLARE @x2 INT;
SET @x2 = @x;

SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY CASE WHEN @x2 = 7 THEN i.ContactPersonID
              WHEN @x2 = 15 THEN i.AccountsPersonID
              ELSE i.SalespersonPersonID
         END;
GO

However, except for some deviation on the estimated rows (since it’s averaging the rows returned), the execution plan is the same.

What’s the big deal right? Well, let’s break down the code into three different procedures:

CREATE PROCEDURE dbo.InvoiceGrouping_Contact
AS
SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY i.ContactPersonID;
GO

CREATE PROCEDURE dbo.InvoiceGrouping_Sales
AS
SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY i.SalespersonPersonID;
GO

CREATE PROCEDURE dbo.InvoiceGrouping_Account
AS
SELECT  SUM(il.UnitPrice),
        COUNT(i.ContactPersonID),
        COUNT(i.AccountsPersonID),
        COUNT(i.SalespersonPersonID)
FROM    Sales.Invoices AS i
JOIN    Sales.InvoiceLines AS il
        ON il.InvoiceID = i.InvoiceID
GROUP BY i.AccountsPersonID;
GO

Interestingly enough, these three queries produce a nearly identical execution plan. The one big difference is the Compute Scalar operator that is used to generate a value for the Hash Match Aggregate is no longer in the query:

specificplan

The same basic set of structures, scans against both tables, to arrive at the data. Cost estimates between the two plans are very different though, with the targeted queries having a much lower estimated cost.

Performance-wise, interestingly enough, the average execution time of the first query, only returning the 10 rows, is 157ms on average, while the query grouping directly on the SalesPersonID averages about 190ms. Now, the reads tell a slightly different story with 17428 on the generic query and 5721 on the specific query. So, maybe a server under load will see a significant performance increase. However, let’s deal with what we have in front of us and say that, at least for these tests, the catch-all GROUP BY query performs well.

Now let’s change the paradigm slightly. Let’s add an index:

CREATE INDEX TestingGroupBy ON Sales.Invoices (SalespersonPersonID);

Frankly, this isn’t a very useful index. However, after adding it, the execution plan for the InvoiceGrouping_Sales query changes. Instead of scanning the table, it’s now scanning the index. Despite recompiles and attempts to force it using hints, the original InvoiceGrouping query will not use this index. Duration of the InvoiceGrouping_Sales query drops to 140ms on average and the reads drop a little further to 5021. Getting an 11% increase on performance is a win.

This is a pretty simplified example, however, making the CASE statement more complex won’t improve performance or further assist the optimizer to make good choices. Instead of trying to cram multiple different logical groupings into a single query, a better approach would be to create the three new procedures that I did above, and make the original InvoiceGrouping procedure into a wrapping procedure that chooses which of the individual procedures to call. This way, if you do add indexes in support of each of the different possible groupings, you would realize a positive outcome in your performance.


Want to talk more about execution plans and query tuning?. In August, I’ll be doing an all day pre-con at SQLServer Geeks Annual Summit in Bangalore India.

I’m also going to be in Oslo Norway for a pre-con before SQL Saturday Oslo in September.

Jun 14 2016

Because

gaypride

Because there’s nothing else I can do at the moment.

 

Jun 13 2016

Choosing the Right SQL Server Edition

Post #6 of #entrylevel #iwanttohelp in support of Tim Ford’s (b|t) beginner’s initiative.

If you’re just getting started with SQL Server, the choices you have in front of you are legion. Which drive do you install your instance on? Which drives hold the databases? How many files do you need for a database? What do the tables look like? Which column or columns should be the primary key? Clustered index? Stored procedures? In-Memory? MAXDOP? Et, as they say, cetera. Ad, as they also say, nauseum.

Before any of that though, you need to pick the correct type of SQL Server to install. That’s right, just saying “SQL Server” is not enough. You must pick between:

SQL Server Developer’s Edition
SQL Server Express
Azure SQL Database
SQL Server Standard
SQL Server Enterprise

I could also add Microsoft Analytics Platform (APS) and Azure SQL Data Warehouse to that list, but that’s going to overly complicate things. We’ll just focus on the five core SQL Server types.

The list above is in the order in which you should be getting started with SQL Server. Let me explain why.

SQL Server Developer’s Edition

If you’re just getting started with SQL Server, SQL Server Developer Edition should be your first stop. With the release of SQL Server 2016 two weeks ago, this edition is now free. You should immediately go and download your own copy. You want to get this because it is everything that is available within SQL Server, all the way up to Enterprise Edition. It’s just licensed for development. If you want to learn about any aspect of SQL Server, start here. If you’re developing against any set of functionality within SQL Server, start here. If you need to support different parts of a SQL Server infrastructure, you start here. Prior to 2016, the Developer’s Edition was only $50-$60, and even then, I would have recommended it over free versions of the product. Now, with Developer Edition being completely free, there’s literally no excuse to not use this to get started learning SQL Server.

SQL Server Express

Assuming you’re just building out your first databases and your first server instances, you may be starting off very small, with only a few users and very little data. SQL Server Express allows you to grow your database up to 10gb, in a production environment (production and development are where licensing and costs differ wildly). This provides a mechanism for you to ensure that you really need SQL Server and all that it offers before you go into a full production installation and all that entails. If you’re only ever going to be very small, this offers a no-cost way to have a professional level database, despite the size. There are additional limitations in what Express supports and you can read about them at the link. It’s just a great way to help get you started.

Azure SQL Database

Here we begin to explore the paid versions of SQL Server. If you’re just getting started and you don’t have much experience administering SQL Server, then Azure SQL Database is a great option for you to start. It is a Platform as a Service offering. This means that it’s not a full instance of SQL Server like everything else listed here. Unlike everything else listed here, it’s not a full instance of SQL Server and that can be a good thing. Backups are managed for you as are a whole slew of other server-level settings, maintenance, etc. The cost model is completely based on what you need to store and how much activity you’re going to generate. You can start very small, and then grow as needed, only paying for what you use. This is extremely attractive if you’re just starting out. Further, the capabilities can grow with you as needed including setting up what would otherwise be extremely complicated things like geo-replication.

SQL Server Standard Edition

SQL Server Standard Edition is the workhorse for SQL Server. It does almost everything that most people need. It scales to the size that most people need to scale to. For your standard business and even standard enterprise needs, Standard Edition (note the name) works extremely well. There are limitations on what it can do. However, these limitations are primarily around extreme scale, or high-end behavior. If you’re just getting started, you don’t need to be messing with that kind of stuff.

SQL Server Enterprise Edition

SQL Server Enterprise Edition is the high end. Here is where you need to go to multi-terrabytes in size and you have massive transaction loads. You’re looking at very sophisticated availability and disaster recovery. Again, the name gives it away. You’re generally only going to this edition when you’re working at an enterprise level of scale and architecture. Since you’re just getting started, don’t worry about this.

Yet.

Conclusion

If you’re just starting to explore the SQL Server space, you should have a copy of Developer Edition. It costs nothing and does all you need. From there, expand into the other Editions as and where needed. Just remember to size your system according to how it’s being used, not how it might be used a year from now. Expanding to Standard from Express is easy. Adding a higher level of support in Azure is simple. Going from Standard to Enterprise is easy. Going backwards, that’s hard. Don’t pick the bigger more sophisticated Edition just in case. Make this choice carefully at the start.

Jun 06 2016

Elastic Query in Azure SQL Database and Views

The question came up, how do the constructs necessary for Elastic Query within Azure SQL Database affect your ability to create views that join across databases. Since I haven’t tested this myself, and I haven’t seen anyone else doing anything with it yet, I decided to set up a test. I recreated my cross database queries using the scripts posted here. Let’s create a view:

CREATE VIEW dbo.JoinedView
AS
SELECT  dt.Val,
        dt2.Val AS Val2
FROM    dbo.DB1Table AS dt
LEFT JOIN dbo.DB2Table AS dt2
        ON dt2.ID = dt.ID;

If I run the query, I get back results. Done. We can create views that join between Azure SQL Databases… But, views are all about masking right? What if I wanted to change the name of the table on my database. Could I do something like this?

CREATE EXTERNAL TABLE dbo.CanICheatThisName (
	ID int, 
	Val varchar(50)) 
WITH (
	DATA_SOURCE = DB2Access);

The CREATE script runs just fine with no errors. When I try to query it though:

Msg 46823, Level 16, State 2, Line 55
Error retrieving data from one or more shards.  The underlying error message received was: ‘Invalid object name ‘dbo.CanICheatThisName’.’.

Understanding how the Elastic Query works, of course this fails. So… what about faking it the other way. I’ll create a view in my second database:

CREATE VIEW dbo.FakeTable
AS
SELECT  *
FROM    dbo.DB2Table AS dt;

Now, I’ll create the EXTERNAL TABLE using the name of the view:

CREATE EXTERNAL TABLE dbo.FakeTable (
	ID int, 
	Val varchar(50)) 
WITH (
	DATA_SOURCE = DB2Access);

When I query this, the data comes across just fine.

Conclusion

Creating a view, or any other query, that joins across databases using Elastic Query works just fine. However, if you want to mask things using a view, you might need to get a little creative in how you implement Elastic Query. The good news is, Elastic Query is somewhat, shall we say, elastic in how you set it up. More so than it immediately appears.

Jun 02 2016

Resources for Learning Azure Data Platform

You want to start working with Azure and the Azure Data Platform, but getting started is not easy. Just knowing where to go to find useful information isn’t easy.

I’m here to help.

I’ve started a GitHub repository that is meant to provide a community-based resource that documents where and how you can learn about the Azure Data Platform. This somewhat duplicates my listing of Data Platform Instructors, but it actually frees that up so I can curate the list the way I want. I’ll probably make it a ranking soon. Why not. Anyway, I want to make sure you’re aware of this resource so that you can consume it or contribute to it. Please help me out if you have something to contribute. Otherwise, please help yourself to what I hope will become a great resource for learning about the Azure Data Platform.

Just know that, at this moment, it has a listing of speakers, mostly from here, and my blog with a link to Azure articles. I’m waiting to see what kind of contributions come in before I spend a lot more time creating material myself. My hope is that I can primarily curate the process and let others do the work (in short, I’m lazy).

Jun 01 2016

Independent Azure Data Platform Instructors

azureThe Azure Data Platform is taking off. I’m seeing more and more interest on the forums, at conferences and in my personal interactions. I’ve been teaching the data platform for six years. Almost as soon as it was available, I started working with it, putting up blog posts and setting up sessions. I’ve had stuff in production on the platform for almost that long too. I’m an advocate and, I hope, an independent voice on the topic. By independent in this case, I mean non-Microsoft. Don’t get me wrong, most of the people I learn from work for Microsoft. They are excellent instructors and more knowledgeable on the topic than I’ll ever be. I’m not questioning the ability of Microsoft people to deliver the very best Data Platform content. I just believe that people also want voices that are not directly beholden to the company.

That brings up my list. I have a list of Azure Data Platform Instructors. I’ve been maintaining this for about a eight weeks now. I’m adding to it regularly. At this point, the list is small and in alphabetical order. As it grows, I will begin the process of curating the list. I’ll probably start with categories only, but eventually, I’ll start to pick and choose who goes on the list based on my knowledge of the individuals involved. For now, these are the independent voices who are teaching Azure and the Azure Data Platform. If you’re getting started in Azure, this is the list you need to consult to find the very best independent voices.