In order to take advantage of R and Python (and Java in SQL Server 2019) directly from your SQL Server scripts, you’ll be using the function sp_execute_external_script. When you see this code in use for the first time, it’s going to remind you of sp_execute_sql. The very first thing I thought about was, “Oh no. Another SQL Injection vector.” I have a little good news and a little bad news.
It’s Not SQL
The first and most important thing to understand is, we’re not talking about SQL. Let’s start with looking at some code. This is straight from the examples in the Microsoft documentation linked above:
DROP PROC IF EXISTS generate_iris_model; GO CREATE PROC generate_iris_model AS BEGIN EXEC sp_execute_external_script @language = N'R' , @script = N' library(e1071); irismodel <-naiveBayes(iris_data[,1:4], iris_data[,5]); trained_model <- data.frame(payload = as.raw(serialize(irismodel, connection=NULL))); ' , @input_data_1 = N'select "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from iris_data' , @input_data_1_name = N'iris_data' , @output_data_1_name = N'trained_model' WITH RESULT SETS ((model varbinary(max))); END; GO
As written, this code clearly can’t suffer from SQL Injection, so, we’re done right? Not quite. Let’s explore this just a little.
First thing, see the @input_data_1, etc. That looks like parameter values. In theory we could add parameters to this procedure and then have the input_data, etc., provided through the procedure. Does that open a vector for SQL Injection?
This is not SQL. Those are not parameters. These are definitions we have to provide in order to find the appropriate data set and then output the data. The code involved is not SQL, therefore, it doesn’t suffer the same fate if we put a semi-colon in and proceeded to attempt SQL Injection.
But, the Data Set!
The sharp eyed will see that the data set is defined by SQL. So, does that suffer from injection attacks? Short answer is no. If there was more than one result set within the Python code, it’s going to error out. So you’re protected there.
This is important, because the data set query can be defined with parameters. You can pass values to those parameters, heck, you’re likely to pass values to those parameters, from the external query or procedure. So, is that an attack vector?
Great. This is all good news right?
A Little Bad News
You can open up one vector for attack. That’s if you parameterize the script itself. Pass that in as a parameter in your procedure. Nothing says you have to use a string like in the example. You could simply pass in the code.
If you also allow individual users to type up any code they want, then you could see people attempting to pass malicious code through R, Python or Java. However, like the CLR implementations done by Microsoft, this code should be safe. If you attempt to do bad things, it should fail, and fail hard.
However, this is a bit of a concern. It doesn’t make me insane, nor should it you. You will need to ensure that you’re doing all the standard things you should be doing anyway. You still should be sanitizing input so that there’s no chance of an attack vector in the beginning. You should also be running security such that no account that is customer facing is also the server admin or sa. Remember, it’s on us to prevent data breaches.
I was asked if this was an attack vector for SQL Injection? I’m happy as hell to report that it’s not. There is the possibility of a bad actor getting some problematic code into your system, which might expose a weakness. However, you have to have made a huge number of bad choices leading up to that point. The overall design absolutely prevents a standard SQL Injection attack, so I wouldn’t worry about that.
On a completely different note, I have some upcoming, all-day, seminars on Database DevOps. I’d love to see you there: