I know that more and more of you are moving more and more resources into the Azure Data Platform. In an effort to help you learn about all that Azure offers I created a GitHub repository of blogs, teachers, courses, etc. It’s open to the community to update. New material has been recently added and there will be more and more all the time. Check it out. If you know of resources that should be listed there, please issue a branch pull and add them right in. This is your resource.
A few months ago I created a GitHub repository for the purpose of building and maintaining a list of resources for Azure Data Platform training. You can see it here. My goal in putting this into GitHub instead of just running it on my blog is to make it a community resource. I want all of you to maintain it. If you’re teaching a class (one hour or one week, I don’t care), I’d like you to add yourself to the instructors list. If you have a blog where you post Azure content, please, add your blog. Are you a PowerBI monster? Get on the list. Please, help me create and grow this list so that people have a central, public, resource for this information. More and more of you are moving into Azure all the time. You need to know where to go to gain the knowledge necessary to be successful on the Azure Data Platform. That’s what the Azure Data Platform Learning Resources repository is all about.
Click the link. Edit a page. Do it!
You get one clustered index per table.
That bears repeating, you get one clustered index per table.
Choosing a clustered index is an extremely important and fundamental aspect of all your SQL Server work. The one clustered index that you get determines how the data in your table is stored. Because the clustered index determines how your data is stored, it also determines how your data is retrieved. Much of SQL Server is engineered around the clustered index because it is such a foundational object for the rest of all behavior. Without a clustered index, the data in your table is stored in what is called a heap. It is essentially a pile, a heap, of data, stored without internal structure. The data in a heap can be found fairly quickly when only a single value is needed, but as soon as you start to retrieve ranges of data, more than one row, heaps become extremely problematic. Meanwhile, the clustered index stores all your data at the bottom of a B+Tree, structured storage, that is used to retrieve this data through the key, or keys, that you give your clustered index. Further, each non-clustered index also uses the key value from your clustered index as it’s means of retrieving data from the clustered index.
All this means that the vast majority of your tables should have a clustered index. Yes, there are exceptions, but, exceptions should be exceptional.
The standard advice, and it’s not wrong, is to make the clustered key as narrow as possible. Next, clustered indexes are designed to support monotonically increasing values extremely well. Clustered indexes must be unique. You either have to create them on a unique value, or, internally you’ll get a UNIQUIFIER applied to the key to ensure that each value is unique. All this taken together means that they work nicely on IDENTITY columns. Because of this advice, you’ll frequently see all the clustered indexes in a database on the IDENTITY columns. However, this is a bit of a trap. You need to remember that the clustered index is not just about storage, but about retrieval. If the most common access path to your data is through, for example, a foreign key, that value might be a better choice for the clustered key. If you always retrieve the data through a date range, that might be a better choice for the clustered key. Keep in mind the standard advice for clustered keys; narrow, monotonically increase, unique. However, also remember, they are where your data is stored. It is frequently better to sacrifice one or more of the standard practices in order to be able to use your clustered index to retrieve the data in most of your queries.
Think through where and how to apply your clustered indexes, and don’t get stuck into using them in a single manner, everywhere, within your system. You get one clustered index on a table. Be sure you put it somewhere that it will work well and somewhere that it will be well used, both.
Azure SQL Data Warehouse can sometimes feel like it’s completely different from SQL Server, but under the covers, it is still (mostly) SQL Server and it does have execution plans. Let’s take a look at one.
I’ve created a new SQL Data Warehouse using the sample database available on the portal, AdventureWorksDW. Here’s a query against that database:
SELECT dd.FullDateAlternateKey AS OrderDate, dc.LastName, SUM(fis.SalesAmount) AS SumSalesAmount FROM dbo.FactInternetSales AS fis JOIN dbo.DimDate AS dd ON fis.OrderDateKey = dd.DateKey JOIN dbo.DimCustomer AS dc ON dc.CustomerKey = fis.CustomerKey GROUP BY dd.FullDateAlternateKey, dc.LastName HAVING SUM(fis.SalesAmount) > 5000.0 ORDER BY OrderDate DESC;
If I attempt to capture an execution plan using the SQL Server Management Studio GUI, nothing happens. If I try to use T-SQL commands, I get an error that those commands are not supported with this version of SQL Server. Same thing if I try to capture a plan using Visual Studio. So… now what? Enter our new command:
EXPLAIN SELECT dd.FullDateAlternateKey AS OrderDate, dc.LastName, SUM(fis.SalesAmount) AS SumSalesAmount FROM dbo.FactInternetSales AS fis JOIN dbo.DimDate AS dd ON fis.OrderDateKey = dd.DateKey JOIN dbo.DimCustomer AS dc ON dc.CustomerKey = fis.CustomerKey GROUP BY dd.FullDateAlternateKey, dc.LastName HAVING SUM(fis.SalesAmount) > 5000.0 ORDER BY OrderDate DESC;
If I run this through the SSMS query window, I get a syntax error. So we’re now in Visual Studio. This is how we generate an execution plan from within Azure SQL Data Warehouse. What you get is XML output in the results like this:
According to the documentation on EXPLAIN, I should be able to click on the XML and it will open up to explore. In my version of Visual Studio (2015), I didn’t find that to be the case. Instead I had to copy and paste the XML into an XML file window that I created within Visual Studio. This is what I finished with:
<?xml version="1.0" encoding="utf-8"?> <dsql_query number_nodes="1" number_distributions="60" number_distributions_per_node="60"> <sql>SELECT dd.FullDateAlternateKey AS OrderDate, dc.LastName, SUM(fis.SalesAmount) AS SumSalesAmount FROM dbo.FactInternetSales AS fis JOIN dbo.DimDate AS dd ON fis.OrderDateKey = dd.DateKey JOIN dbo.DimCustomer AS dc ON dc.CustomerKey = fis.CustomerKey GROUP BY dd.FullDateAlternateKey, dc.LastName HAVING SUM(fis.SalesAmount) > 5000.0 ORDER BY OrderDate DESC</sql> <dsql_operations total_cost="5.98868068474576" total_number_operations="13"> <dsql_operation operation_type="RND_ID"> <identifier>TEMP_ID_14</identifier> </dsql_operation> <dsql_operation operation_type="ON"> <location permanent="false" distribution="AllComputeNodes" /> <sql_operations> <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_14] ([DateKey] INT NOT NULL, [FullDateAlternateKey] DATE NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation> </sql_operations> </dsql_operation> <dsql_operation operation_type="BROADCAST_MOVE"> <operation_cost cost="1.99584" accumulative_cost="1.99584" average_rowsize="7" output_rows="1188" GroupNumber="12" /> <source_statement>SELECT [T1_1].[DateKey] AS [DateKey], [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey] FROM [DWTest].[dbo].[DimDate] AS T1_1</source_statement> <destination_table>[TEMP_ID_14]</destination_table> </dsql_operation> <dsql_operation operation_type="RND_ID"> <identifier>TEMP_ID_15</identifier> </dsql_operation> <dsql_operation operation_type="ON"> <location permanent="false" distribution="AllDistributions" /> <sql_operations> <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_15] ([OrderDateKey] INT NOT NULL, [CustomerKey] INT NOT NULL, [SalesAmount] MONEY NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation> </sql_operations> </dsql_operation> <dsql_operation operation_type="SHUFFLE_MOVE"> <operation_cost cost="3.93098847457627" accumulative_cost="5.92682847457627" average_rowsize="16" output_rows="60398" GroupNumber="10" /> <source_statement>SELECT [T1_1].[OrderDateKey] AS [OrderDateKey], [T1_1].[CustomerKey] AS [CustomerKey], [T1_1].[SalesAmount] AS [SalesAmount] FROM [DWTest].[dbo].[FactInternetSales] AS T1_1</source_statement> <destination_table>[TEMP_ID_15]</destination_table> <shuffle_columns>CustomerKey;</shuffle_columns> </dsql_operation> <dsql_operation operation_type="RND_ID"> <identifier>TEMP_ID_16</identifier> </dsql_operation> <dsql_operation operation_type="ON"> <location permanent="false" distribution="AllDistributions" /> <sql_operations> <sql_operation type="statement">CREATE TABLE [tempdb].[dbo].[TEMP_ID_16] ([FullDateAlternateKey] DATE NOT NULL, [LastName] NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS, [col] MONEY NOT NULL ) WITH(DATA_COMPRESSION=PAGE);</sql_operation> </sql_operations> </dsql_operation> <dsql_operation operation_type="SHUFFLE_MOVE"> <operation_cost cost="0.0618522101694915" accumulative_cost="5.98868068474576" average_rowsize="111" output_rows="136.985" GroupNumber="25" /> <source_statement>SELECT [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey], [T1_1].[LastName] AS [LastName], [T1_1].[col] AS [col] FROM (SELECT SUM([T2_2].[SalesAmount]) AS [col], [T2_1].[FullDateAlternateKey] AS [FullDateAlternateKey], [T2_2].[LastName] AS [LastName] FROM [tempdb].[dbo].[TEMP_ID_14] AS T2_1 INNER JOIN (SELECT [T3_2].[OrderDateKey] AS [OrderDateKey], [T3_2].[SalesAmount] AS [SalesAmount], [T3_1].[LastName] AS [LastName] FROM [DWTest].[dbo].[DimCustomer] AS T3_1 INNER JOIN [tempdb].[dbo].[TEMP_ID_15] AS T3_2 ON ([T3_2].[CustomerKey] = [T3_1].[CustomerKey])) AS T2_2 ON ([T2_1].[DateKey] = [T2_2].[OrderDateKey]) GROUP BY [T2_1].[FullDateAlternateKey], [T2_2].[LastName]) AS T1_1</source_statement> <destination_table>[TEMP_ID_16]</destination_table> <shuffle_columns>FullDateAlternateKey;</shuffle_columns> </dsql_operation> <dsql_operation operation_type="ON"> <location permanent="false" distribution="AllDistributions" /> <sql_operations> <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_15]</sql_operation> </sql_operations> </dsql_operation> <dsql_operation operation_type="ON"> <location permanent="false" distribution="AllComputeNodes" /> <sql_operations> <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_14]</sql_operation> </sql_operations> </dsql_operation> <dsql_operation operation_type="RETURN"> <location distribution="AllDistributions" /> <select>SELECT [T1_1].[FullDateAlternateKey] AS [FullDateAlternateKey], [T1_1].[LastName] AS [LastName], [T1_1].[col] AS [col] FROM (SELECT [T2_1].[FullDateAlternateKey] AS [FullDateAlternateKey], [T2_1].[LastName] AS [LastName], [T2_1].[col] AS [col] FROM (SELECT SUM([T3_1].[col]) AS [col], [T3_1].[FullDateAlternateKey] AS [FullDateAlternateKey], [T3_1].[LastName] AS [LastName] FROM [tempdb].[dbo].[TEMP_ID_16] AS T3_1 GROUP BY [T3_1].[FullDateAlternateKey], [T3_1].[LastName]) AS T2_1 WHERE ([T2_1].[col] > CAST ((5000.0) AS DECIMAL (5, 1)))) AS T1_1 ORDER BY [T1_1].[FullDateAlternateKey] DESC</select> </dsql_operation> <dsql_operation operation_type="ON"> <location permanent="false" distribution="AllDistributions" /> <sql_operations> <sql_operation type="statement">DROP TABLE [tempdb].[dbo].[TEMP_ID_16]</sql_operation> </sql_operations> </dsql_operation> </dsql_operations> </dsql_query>
So now we just save this as a .sqlplan file and open it in SSMS, right?
See, that’s not a regular execution plan, at all. Instead, it’s a D-SQL plan. It’s not the same as our old execution plans. You can’t open it as a graphical plan (and no, not even in that very popular 3rd party tool, I tried). You will have to learn how to read these plans differently because, well, they are different.
I’m acting as the curator to a list of Azure Data Platform Learning Resources. It’s speakers and teachers, blogs, articles and more, all focused around the Azure Data Platform. I have it hosted on GitHub so that anyone can contribute. There have been several updates recently that you may want to take a look at.
If you’re looking for where and how to get started in the Azure Data Platform, this is a great resource to get you going.
If you’re teaching the Azure Data Platform, anything from PowerBI to DocumentDB to Azure SQL Data Warehouse and all the stuff in between, please add your information so that this list is up to date.
I love the Query Store. Seriously. It’s a huge leap forward in the capabilities of Azure SQL Database and SQL Server in support of performance monitoring and query optimization. One of my favorite aspects of the Query Store is the ability to force plans. Frankly though, it’s also the scariest part of the Query Store. I do believe that plan forcing will be one of the most ill-used functions in SQL Server since the multi-statement table-valued user-defined function (don’t get me started). However, unlike the UDF, this ill-use will be because of poor understanding on the part of the user, not a fundamental design issue. No, plan forcing and the Query Store are very well constructed. Let me give you an example of just how well constructed they are.
Let’s imagine that have a situation such as bad parameter sniffing where you’ve determined that from the more than one possible execution plans against a table, there is a preferred plan. Enabling plan forcing to ensure that plan gets used is a no-brainer. Let’s further imagine that you have a junior DBA who is… let’s just say overly aggressive in their duties such that they do silly things occasionally. What happens when your pretty plan, which uses a particular index meets your junior DBA who just dropped that index?
Here’s the setup. We’re using the WideWorldImporters database and we have this query:
SELECT * FROM Warehouse.StockItemTransactions AS sit WHERE sit.TransactionOccurredWhen BETWEEN '9/9/2015' AND '9/11/2015';
This query, with the default configuration, will scan the existing table, so I’ll add an index:
CREATE INDEX TransactionOccurredWhenNCI ON Warehouse.StockItemTransactions (TransactionOccurredWhen);
For a limited range such as the one I’m passing above, I’ll get a plan with a key lookup operation which runs faster than the scan, so I’m happy. For a broader range, I’m likely to see a scan again, but since most of my queries have a very narrow range, I’d sure like to be able to force the plan to always compile to the seek and key lookup. To do this I need to find the query_id and plan_id from the Query Store (assuming I’m not using the GUI):
SELECT qsp.plan_id, qsp.query_id, qsqt.query_sql_text, qsp.count_compiles FROM sys.query_store_plan AS qsp JOIN sys.query_store_query AS qsq ON qsq.query_id = qsp.query_id JOIN sys.query_store_query_text AS qsqt ON qsqt.query_text_id = qsq.query_text_id WHERE qsqt.query_sql_text LIKE 'SELECT * FROM Warehouse.StockItemTransactions AS sit%';
With those values, I can force the execution plan so that it will always use the plan I want:
EXEC sys.sp_query_store_force_plan 42460,463;
That’s it. I’m happy because I’m going to see the execution plan used over and over, despite any values passed during a recompile.
Along comes our aggressive junior DBA who decides that there are “too many” indexes on the server. No, I don’t know what that means either, but they evidently read it on the internet or something so they drop the index we created before:
DROP INDEX TransactionOccurredWhenNCI ON Warehouse.StockItemTransactions;
What now happens to our lovely execution plan and the plan forcing? We’ll take a look at two events in Extended Events, sql_statement_recompile and query_store_plan_forcing_failed. Nothing happens immediately on dropping the index. The plans associated with that object, if any, are marked as invalid in the cache. The next time we call the query it’s going to recompile and we can see the event:
The most important part of the event is the recompile_cause which is “Schema changed”. However, I would also note the attach_activity_id.guid. I’ve chosen to enable causality tracking in this Extended Event session. This will cause all events associated with a common activity to get a GUID and then a sequence. This is interested because, after the recompile event, we get the query_store_plan_forcing_failed event:
The guid value is the same as the event above and the *.seq number is now 2, showing that, for these events, the recompile event occurred and then this event occurred. That makes perfect sense. The plan is marked for recompile, so, it’s going to be recompiled. I have enabled plan forcing though, so I have a particular plan that I want the optimizer to use. However, thanks to my “helpful” junior DBA, the plan is now invalid. You even get the description of what happened in the message field for the event:
Index ‘WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhenNCI’, specified in the USE PLAN hint, does not exist. Specify an existing index, or create an index with the specified name.
The first question now is, what happens with this query and the execution plan? Does the new plan generated now that the index is missing get stored in cache? Yes, it does. We can validate that by querying the cache, or, when capturing the actual execution plan, checking the “Retrieved from cache” property.
Because plan forcing is enabled, do we see a recompile every time this query is called? The answer to that question is slightly complex. Under normal circumstances, no. As long as that plan remains in cache, it’s simply reused. No other recompiles occur. A normal recompile event will cause another attempt at applying the invalid execution plan and we would see yet another query_store_plan_forcing_failed event for each recompile on the query. However, during testing, Joey D’Antoni (who was helping me play with this when we discussed what would happen when a plan was made invalid) had severe memory pressure on his server. He saw intermittent recompiles with a cause message that said plan forcing had failed. So if your server is under extreme stress and you cause this issue, you might see different messages. Just remember, the cause of the recompiles was not the plan forcing, but the memory pressure.
The fun thing is, as long as I don’t remove the plan forcing or take the query and plan out of the Query Store manually, if I recreate the index on my table with the same name and definition as that expected by the plan, the Query Store will simply reapply the plan and then successfully force it during any subsequent recompile situation. This is because Query Store is persisted with the database and barring outside activity, the information there will remain, just like the rest of the data in the database.
All of this means that Query Store works exactly the way we would expect, not forcing additional recompiles when you, or your junior DBA, inadvertently invalidate a plan. It also works as expected in that forcing a plan is stored with your database so that, assuming you don’t remove that plan from the Query Store, it will simply be reapplied after you fix the problem. It’s fun to see the thought that went behind the design of the behavior of Query Store. However, please, use plan forcing judiciously.
In post #6, I talked about Azure SQL Database as one of the choices you have when you’re picking the type of SQL Server instance you want to run. I want to expand on why you should be considering moving into Azure SQL Database at the start of your career and some of the important differences you’ll have to be aware of as you get going.
Since you are right at the start of your career, you may as well plan on maximizing the life of the knowledge and skills you’re building. By this, I mean spend your time learning the newest and most advanced software rather than the old approach. Is there still work for people who only know SQL Server 2000? Sure. However, if you’re looking at the future, I strongly advocate for going with online, cloud-based systems. This is because, more and more, you’re going to be working with online, connected, applications. If the app is in the cloud, so should the data be. Azure and the technologies within it are absolutely the cutting edge today. Spending your limited learning time on this technology is an investment in your future.
The important point to remember is that when you start to look at something like Azure SQL Database is that, superficial resemblance aside, it’s not actually SQL Server. Oh yeah, it is. Those of us who have been teaching and supporting it for years emphasize the similarities. You know how to create a table in SQL Server? Great, same syntax in Azure SQL Database. Same thing for so much of the behavior of Azure SQL Database. However, at it’s core, Azure SQL Database is not the same as SQL Server. It’s a platform as a service offering. This means you must change the way you think about what you’re going to implement on it and how you’re going to implement it.
Take one example, backups. Since Azure SQL Database is a managed service, you don’t have access to the same type of functionality for backups as you do for regular SQL Server. That’s not to say that you can’t get backups. First, Microsoft itself has backups running on your databases for you. You have the capabilities for a point in time recovery build into the service directly. You can also ensure protection of your databases through geo-replication. Further, if you really do want to do a backup for some reason, you can. It’s not through the traditional BACKUP DATABASE command, but it’s still a fully-recoverable copy of your data and structure through the BACPAC (just make sure you create a copy of your database first). This does mean that you can’t backup your Azure SQL Database through traditional means and then run a RESTORE command locally, but this isn’t how you should be working with Azure SQL Database anyway. Remember, it’s a platform as a service and is therefore not the same. This is just one example of several in how you have to think about Azure SQL Database differently.
Assume that some of your assumptions may have to be questioned as you move into this new space. It does require a modification in your approach, but one that, as someone just starting your career, you’re well positioned to make.
Sooner or later when you’re working with SQL Server, someone is going to complain that the server is slow. I already pointed out the first place you should look when this comes up. But what if they’re more precise? What if, you know, or at least suspect, you have a problem with a query? How do you get information about how queries are behaving in SQL Server?
Choices For Query Metrics
It’s not enough to know that you have a slow query or queries. You need to know exactly how slow they are. You must measure. You need to know how long they take to run and you need to know how many resources are used while they run. You need to know these numbers in order to be able to determine if, after you do something to try to help the query, you’ll know whether or not you’ve improved performance. To measure the performance of queries, you have a number of choices. Each choice has positives and negatives associated with them. I’m going to run through my preferred mechanisms for measuring query performance and outline why. I’ll also list some of the other mechanisms you have available and tell you why I don’t like them. Let’s get started.
Dynamic Management Views
Since SQL Server 2005, Dynamic Management Views (DMV) and Functions (DMF) have been available for access all sorts of information about the server. Specifically there are a few DMVs that are focused on queries and query performance. If you go back through my blog, you can find tons of examples where I illustrate their use. You can also see them at work in commercial tools and free tools. Adam Machanic’s sp_WhoIsActive, a free tool, makes extensive use of DMVs. To learn more about DMVs, you can download a free book, Performance Tuning with SQL Server Dynamic Management Views. DMVs are available in Azure SQL Database, Azure SQL Data Warehouse, and all editions of SQL Server.
The information captured by DMVs is an aggregation of all the times the query has been run. This means you can’t find how long the query ran at 3PM yesterday. You can though see the minimum and maximum time the query took as well as the average. The ability to see this information is what makes DMVs useful. However, another important point about DMVs is that they only collect information while a query is in memory. As soon as it leaves the cache (the area of memory it is stored in), so does all the aggregated information about the query in the DMVs.
You use the DMVs for a general understanding of how a query is behaving. They’re not meant for detailed for long term collection of information about queries. For that we use other tools.
Introduced in SQL Server 2008, Extended Events (ExEvents) are a mechanism for capturing detailed information about SQL Server and the processes within. One of those processes is how queries behave. I have multiple examples on this blog on using ExEvents. You can’t go wrong reading about them on Jonathan Kehayias’ blog. Extended events are available in Azure SQL Database and all editions of SQL Server.
When you need to know every query against a database, or each time a particular query is called, and all the details associated with the query (reads, writes, duration), ExEvents are the way to go. ExEvents are very lightweight on the server (but not free) and can be filtered so that you capture just the information you need. The information is detailed and not aggregated. Instead it’s raw. The real issue with capturing this data is the amount of data you’ll be capturing. Testing and careful filtering to ensure you’re dealing with too much information is important. Prior to SQL Server 2012, there was no graphical user interface for reading ExEvent data, so you would have been forced to run queries against the XML that the information is captured within. With the tools available in SQL Server Management Studio, this is no longer the case.
You use ExEvents when you need specific and detailed information about a query. ExEvents are not so good for generalized monitoring.
Introduced in Azure SQL Database, and first released in SQL Server with 2016, Query Store is another mechanism for capturing aggregated information about queries. As before, I have examples on how to work with Query Store on my blog. You can also find quite a bit on it over at Simple-Talk. Query Store is pretty specialized still and only available in Azure and SQL Server 2016, but it is in all editions of SQL Server 2016.
Query Store captures information similar to what is available in the DMVs. However, unlike the DMVs, the information that Query Store captures is kept around, even after a query ages out or is removed from cache. This persistence makes Query Store very exciting. You do have to choose to turn it on for each database you wish to capture queries for. It’s not automatic like DMVs. The capture processes are asynchronous, so they should be relatively light weight for most databases.
You use the Query Store when you need to capture query metrics over the long term, but you don’t need detailed information and aggregations works well for you.
There are other ways to measure query performance. You can use the Profiler GUI, but that actually seriously negatively impacts the server. You can bring a server down by using it, so it should be avoided. Profiler generates scripts called trace, which can be used to monitor your server. However, they have a much higher impact than ExEvents and they’re on the deprecation list. Microsoft is not added new trace events for new functionality, so they’re becoming less and less useful with each release. You also can’t use trace against Azure. If you’re writing a query and you just want to see how long it takes to run, you can use SET STATISTICS TIME ON, to capture the execution time. This is a handy way to quickly measure performance. There is also the ability to capture reads and writes using SET STATISTICS IO ON, but, while this does capture the metrics we need, it adds considerable overhead to the query, skewing performance measurement. This is why I stick to ExEvents when I need an accurate measure.
Honest people can disagree about the best way to capture query performance. I have my preferences as you can see. However, I’m fairly certain that everyone would agree that it’s important to know how to capture performance metrics in order to be able to assert that performance has increased or decreased in a measured fashion. You don’t want to guess at query performance, you want to know.
The question came up, how do the constructs necessary for Elastic Query within Azure SQL Database affect your ability to create views that join across databases. Since I haven’t tested this myself, and I haven’t seen anyone else doing anything with it yet, I decided to set up a test. I recreated my cross database queries using the scripts posted here. Let’s create a view:
CREATE VIEW dbo.JoinedView AS SELECT dt.Val, dt2.Val AS Val2 FROM dbo.DB1Table AS dt LEFT JOIN dbo.DB2Table AS dt2 ON dt2.ID = dt.ID;
If I run the query, I get back results. Done. We can create views that join between Azure SQL Databases… But, views are all about masking right? What if I wanted to change the name of the table on my database. Could I do something like this?
CREATE EXTERNAL TABLE dbo.CanICheatThisName ( ID int, Val varchar(50)) WITH ( DATA_SOURCE = DB2Access);
The CREATE script runs just fine with no errors. When I try to query it though:
Msg 46823, Level 16, State 2, Line 55
Error retrieving data from one or more shards. The underlying error message received was: ‘Invalid object name ‘dbo.CanICheatThisName’.’.
Understanding how the Elastic Query works, of course this fails. So… what about faking it the other way. I’ll create a view in my second database:
CREATE VIEW dbo.FakeTable AS SELECT * FROM dbo.DB2Table AS dt;
Now, I’ll create the EXTERNAL TABLE using the name of the view:
CREATE EXTERNAL TABLE dbo.FakeTable ( ID int, Val varchar(50)) WITH ( DATA_SOURCE = DB2Access);
When I query this, the data comes across just fine.
Creating a view, or any other query, that joins across databases using Elastic Query works just fine. However, if you want to mask things using a view, you might need to get a little creative in how you implement Elastic Query. The good news is, Elastic Query is somewhat, shall we say, elastic in how you set it up. More so than it immediately appears.
You want to start working with Azure and the Azure Data Platform, but getting started is not easy. Just knowing where to go to find useful information isn’t easy.
I’m here to help.
I’ve started a GitHub repository that is meant to provide a community-based resource that documents where and how you can learn about the Azure Data Platform. This somewhat duplicates my listing of Data Platform Instructors, but it actually frees that up so I can curate the list the way I want. I’ll probably make it a ranking soon. Why not. Anyway, I want to make sure you’re aware of this resource so that you can consume it or contribute to it. Please help me out if you have something to contribute. Otherwise, please help yourself to what I hope will become a great resource for learning about the Azure Data Platform.
Just know that, at this moment, it has a listing of speakers, mostly from here, and my blog with a link to Azure articles. I’m waiting to see what kind of contributions come in before I spend a lot more time creating material myself. My hope is that I can primarily curate the process and let others do the work (in short, I’m lazy).