In April, I said I was going to start learning Jupyter Notebooks. It's November. Let's get going with your first Jupyter Notebook. A quick aside before we start. I think one of the huge strengths that is going to come out of these things is as a runbook. You can share a notebook with someone, they can run the queries on it against their own systems and return the book, with the results to you. That's going to be extremely useful as a troubleshooting tool, but has all sorts of other functionality as well. I strongly suggest you start learning these things, as I am. Azure Data Studio There are a number of ways to create and consume Jupyter Notebooks, but I want to focus on the functionality around data…
In order to take advantage of R and Python (and Java in SQL Server 2019) directly from your SQL Server scripts, you'll be using the function sp_execute_external_script. When you see this code in use for the first time, it's going to remind you of sp_execute_sql. The very first thing I thought about was, "Oh no. Another SQL Injection vector." I have a little good news and a little bad news. It's Not SQL The first and most important thing to understand is, we're not talking about SQL. Let's start with looking at some code. This is straight from the examples in the Microsoft documentation linked above: DROP PROC IF EXISTS generate_iris_model; GO CREATE PROC generate_iris_model AS BEGIN EXEC sp_execute_external_script @language = N'R' , @script = N' library(e1071); irismodel <-naiveBayes(iris_data[,1:4], iris_data[,5]);…
In my last post I showed how you can create a volume with your container. I then showed a few things you can with a container using a volume. I want to explore volumes just a little bit more. Locate Your Volume To have a little more fun with volumes, first, let's share a drive. You do this in the Settings in Docker Desktop (assuming that's what you're using): While this should just work, it didn't for me until I restarted Docker. So you may need to do that. Go to the drive and create a directory. I'm putting one in at C:\Docker\SQL. Once I've done that, let's create a new container: docker run ` --name SQL19 ` -p 1433:1433 ` -e "ACCEPT_EULA=Y" ` -e 'SA_PASSWORD=$cthulhu1988' ` -v C:\Docker\SQL:/sql `…
In yesterday's blog post we pulled SQL Server images in preparation for today's blog post where we create containers from those images. If you haven't already, get Docker installed and follow the instructions here to get at least one image on your machine. If you're interested in why I'm talking about containers all week, read this. I'm using all PowerShell commands to control Docker. Docker Run You can use 'docker create' to create an image and then start it up. However, we can just get started running a container from one of the images we downloaded yesterday. We can just simultaneously create and start the container using 'docker run': docker run -e 'ACCEPT_EULA=Y' ` -e 'SA_PASSWORD=$cthulhu1988' ` -p 1433:1433 ` --name dockerdemo ` -d mcr.microsoft.com/mssql/server:2017-latest Let's break this down a…
The system_health Extended Events session is incredibly useful. Further, it's running, by default, in every server you have under management that is 2008 or greater. Things are not the same in Azure though. system_health in Azure SQL Database If you look at the documentation for system_health, it shows that it's applicable to Azure SQL Database. However, if you try to run the example query, it won't work. This is because the implementation of Extended Events inside Azure SQL Database is a little different. Instead, you need to use the Azure SQL Database equivalent system views to create the same query like this: SELECT CAST(dxdst.target_data AS XML) FROM sys.dm_xe_database_session_targets AS dxdst JOIN sys.dm_xe_database_sessions AS dxds ON dxds.address = dxdst.event_session_address WHERE dxds.name = 'system_health'; Now, running this in Azure, prepare to be…
Gathering metrics is quite difficult if there are no queries. So, if you're working in non-production environments, but you still want to see some sort of load on the server, how can you do it? I use a simple PowerShell script to simulate load. Simulate Load I've posted a sample of the meat of the script before. It's a simple way to test a procedure. However, I've modified it to do a little more than just run the procedure forever. Here's the script: # connect to the database $SqlConnection = New-Object System.Data.SqlClient.SqlConnection $SqlConnection.ConnectionString = 'Server=WIN-8A2LQANSO51;Database=AdventureWorks2017;trusted_connection=true' # gather values $RefCmd = New-Object System.Data.SqlClient.SqlCommand $RefCmd.CommandText = "SELECT th.ReferenceOrderID, COUNT(th.ReferenceOrderID) AS RefCount FROM Production.TransactionHistory AS th GROUP BY th.ReferenceOrderID;" $RefCmd.Connection = $SqlConnection $SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter $SqlAdapter.SelectCommand = $RefCmd $RefData = New-Object System.Data.DataSet…
At a recent all-day seminar on query performance tuning I was asked a question that I didn't know the answer to: "How do join hints affect adaptive joins?" I don't know. Let's find out together. Adaptive Joins Here's a query that we can run against AdventureWorks: SELECT p.Name, COUNT(th.ProductID) AS CountProductID, SUM(th.Quantity) AS SumQuantity, AVG(th.ActualCost) AS AvgActualCost FROM Production.TransactionHistory AS th JOIN Production.Product AS p ON p.ProductID = th.ProductID GROUP BY th.ProductID, p.Name; Without a columnstore index in SQL Server 2017, the execution plan looks like this: Let's introduce a columnstore index: CREATE NONCLUSTERED COLUMNSTORE INDEX ix_csTest ON Production.TransactionHistory ( ProductID, Quantity, ActualCost ); Now, if we run the same query, the execution plan changes to use an adaptive join like this: You can read more on adaptive joins here…
Why don't "actual execution plans" have "actual execution plan costs"? This is a question and a myth I have to fight against all the time. It's so hard to convince people that all execution plans are estimated plans in the first place (by the way, all execution plans are estimated plans). If we execute a query at the same time we capture a plan, we have enabled SQL Server to also capture run-time metrics with that plan. So we end up with what is known as an actual plan, but it's still just an estimated plan plus those run-time metrics. Execution Plan Costs When you look at a given operator within an estimated plan, it's going to show you four numbers related to cost: Estimated CPU Cost Estimated I/O Cost…
I was surprised to find out that a lot people hadn't heard about the new join type, Adaptive join. So, I figured I could do a quick overview. Adaptive Join Behavior Currently the adaptive join only works with columnstore indexes, but according to Microsoft, at some point, they will also work with rowstore. The concept is simple. For larger data sets, frequently (but not always, let's not try to cover every possible caveat, it depends, right), a hash join is much faster than a loops join. For smaller data sets, frequently, a loops join is faster. Wouldn't it be nice if we could change the join type, on the fly, so that the most effective join was used depending on the data in the query. Ta-da, enter the adaptive join.…
The preferred method for modifying your data within a database is T-SQL. While the last Fundamentals post showed how to use the GUI to get that done, it's not a very efficient mechanism. T-SQL is efficient. UPDATE The command for updating information in your tables is UPDATE. This command doesn’t work the same way as the INSERT statement. Instead of listing all the columns that are required, meaning columns that don’t allow for NULL values, you can pick and choose the individual columns that you want to update. The operation over-writes the information that was stored in the column with new information. In addition to defining the table and columns you want to update, you have to tell SQL Server which rows you’re interested in updating. This introduces the WHERE…