May 19 2015

Book Review: Connected

I heard about Connected from a show on NPR (Yes, I listen to NPR, why do people keep saying that?). It was right after another segment talking about how positivity affects your mood and your ability to think and act in a clear fashion. I’ve long been a believer in the ability of your network to impact you, but I really didn’t think about it beyond that. Hearing about the book Connected changed my thinking, so I ran out (meaning, connected to Amazon) and got a copy.

The premise of the book is pretty simple. You have close friends and acquaintances. Your close friends and acquaintances also have friends and acquaintances, that may or may not over lap with yours. Those people also have another set of friends and acquaintances. And here’s the kicker, that third layer, not your friend, or your friend’s friend, but your friends friends friend can affect your daily mood, the amount of exercise you do, whether or not you smoke, your involvement in crime, all sorts of things. The book sets out to prove it. Along the way you also learn about things like why you probably only have somewhere between 3-8 close friends. Why you probably don’t have more than about 100 people that you communicate with regularly (uh, but what about my 7,000+ Twitter followers?). How these are to a degree biological factors hardwired into you. Most interesting of all is how the ripples just fade away at the third layer, over and over again throughout their studies and their testing.

The book was just filled with highly interesting facts about how your network influences you. Also, how you can influence your network. It also matters the type of network that you have. Are you connected to lots of people that aren’t connected to each other, weak ties, or are you connected to lots of people that are all connected to one another, strong ties. Each of these types of networks influences you differently. Your behavior within a network is probably following one of three paths; cooperator, you’re willing to help others, free rider, you’re letting others do the heavy lifting, enforcer, you’re making sure everyone follows the rules. Your behavior is also likely to shift between those roles depending on who you’re interacting with and when.

In short, a fascinating book. I do have a nit to pick with it though. At the end of it all, I have a great set of information about what a strong network would look like. I get a good sense of why I would want to have a strong network. Nothing about how to really get a strong network other than making sure my friends are connected with my friends and that my friends, and as much as possible their friends and their friends, are all on a positive path. Right. I’m sure that’s easy to work out. Guidance around this network thing would have been nice.

My own takeaway, be positive, act positive, strive, earn my Trident every day, and, at least according to Connected, that should go out into my network like ripples in a pond. Further, I should see those same ripples coming back, reinforcing my own approaches.

I have no idea how to measure this. Ha!

May 18 2015

Come to Us or We’ll Come to You

No, that’s not a threat. It’s an offer to help out.

Redgate Software is very serious about the efforts we’re putting into creating the tools needed to support your Database Lifecycle Management (DLM) processes. DLM is a vital part of supporting both Agile development methods and moving towards an automated DevOps style of systems management that tightly integrates your software development and deployments with your database development and deployments. I’ve said it before and I’ll say it again, you can get really good at performing manual tasks, or you can get really good at automation. You want to get good at automation and we’re here to help. Getting a smooth process from source control, continuous integration, continuous delivery and continuous deployment can be a lot of work, but work with huge benefits. We can make it easier.

How you may ask?

To start with, how about a short consulting engagement where we come to your place of work and perform a detailed assessment of your current database development and deployment processes with the goal of delivering a document explaining how you could move towards a more fully automated DLM methodology? We’re calling that a DLM Health Check. Or, if you’re ready to take the big leap, we have consultants who can come in and help with a long-term engagement to assist you in building out your DLM processes.

Maybe you’re more of a hands-on type who wants to do things for themselves. Not a problem. We have a series of hands-on labs that will teach you what you need to know to set up your own DLM processes. Here’s the schedule of upcoming classes:

  • London, UK (May 20)
  • Philadelphia (June 4)
  • Belfast, Northern Ireland (June 26)
  • London (July 8)
  • San Diego, CA (July 16)
  • Manchester, UK (July 24)
  • Baton Rouge, LA (July 31)

There will be more to come, and, if we get enough requests to deliver a class somewhere, we’ll make that happen too. These classes will teach the skills and methods you need to start the process of implementing your own automated DLM automation.

As I said, come to us or we’ll come to you. Either way, we can help you create automation around your database deployments so you can move fast but still protect your production servers.

May 15 2015

I Am Grateful For…

I’m almost done reading the book Connected (will be by the time this is published). It’s fascinating although, at least so far, little in the way of suggestions for how to take advantage of this information. That however doesn’t change my plans around working on positive thoughts and emotions as a way to hack your own mind and increase productivity. I’m also trying to figure out how to reinforce my network. In short, these are completely self-serving posts.

I am grateful for Thomas LaRock (b|t).

Tom is hard to miss. If you look at the floor all the time, the green shoes might be a giveaway. If you look up a lot, keep looking up because he’s quite tall. Oh yeah, and he’s the President of PASS (for a couple of more weeks) so he’s frequently up on stage. All that’s not why I’m grateful for Tom (though I do think he’s done an outstanding job as PASS President, if you agree you might want to thank him). I’m grateful for Tom because he’s just always been there. When I first hooked into the SQL Server community (on the very 2nd night of SQL Karaoke, before it was called that), Tom was there. He worked for a big investment company when I was working for a big insurance company. We had a lot in common (despite his terrible youth). We were both working on SCOM and presenting on it. We both started working in Azure together. At no point was any of this competition either. He would discover something really cool and before putting it up on his blog or writing an article, he’d call me and walk me through it. I learned how to do things better because of Tom. I also learned, well, still working on this one, how to be better at supporting others because that’s just what he does. Tom got a job as an evangelist, so I had to run and get one too. Tom was on the Board at PASS, so I had to run out and do that too. I’m kind of hoping Tom wins the Lottery because I’m assuming I will shortly after. Tom is blazing trails all the time, but he makes sure that he reaches back and helps others down the same path. That’s pretty special.

Oh yeah, and I should mention, bacon, Jager and a lopsided football helmet.

Nice work Tom. Thanks.

May 11 2015

Is Performance Better With LEFT JOIN or RIGHT JOIN?

I tend to write my queries using LEFT JOIN. Why? Because logically I see it in my head like this:

Give me all the rows from this table and only those rows that match from the other table.

But, wouldn’t this logic work just as well:

Give me only the rows in this table that match the rows from this other table where I’m selecting all of them.

I know. If I worked on it some more I could make that a better sentence, but I’m pretty sure the logic is still sound. Only matching rows from one data set, all the rows from another data set. In short, RIGHT JOIN.

I read recently that we ought to be making everything into a LEFT JOIN because it performs better. I suspect someone had a bad day, wrote the JOIN criteria poorly, and a new bit of cargo cult wisdom was discovered. Because, well, let’s look at the queries:

SELECT  *
FROM    Sales.SalesOrderHeader AS soh
LEFT JOIN Sales.SalesOrderDetail AS sod
        ON sod.SalesOrderID = soh.SalesOrderID;

SELECT  *
FROM    Sales.SalesOrderDetail AS sod
RIGHT JOIN Sales.SalesOrderHeader AS soh
        ON soh.SalesOrderID = sod.SalesOrderID;

Yeah, I know. Not exactly the height of query optimization and elegance. But, the point is made pretty easily. When these queries run, both return 121317 rows. They have almost identical performance on my system at 2.8 seconds and 2.7 seconds with identical reads and CPU. One run. You can run it a million times if you want. I’m not going to and here’s why:

LeftRightJOIN

Those execution plans are the same. Just because they look the same? How do I know that for sure? Because they have the same query_plan_hash value: 0x857588F9A4394676.

Now, if we changed the logic so that SalesOrderDetail was on the “all” side of the query, of course that would change things, but, that changes to logic, not just the JOIN. The query optimizer is able to deal with what is basically just syntactic sugar. No. LEFT JOIN is not better than RIGHT JOIN. It’s just different logic that is dealt with by the optimizer.

Want to talk query tuning and execution plans? Let’s get together for an all day seminar at Connections this year. Check it out here.

May 05 2015

Effects of Persisted Columns on Performance

I live for questions. And my favorite questions are the ones where I’m not completely sure of the answer. Those are the questions that make me stop presenting in order to take a note so I can try to answer the question later, usually in a blog post. Guess where we are today?

I was asked at SQL Bits in London about the direct impact of the PERSISTED operator on calculated columns, both inserts and selects. I didn’t have a specific answer, so I wrote it down for later (and asked the, self-described, persisting Dane, to email me to remind me. He did, so I put together a few tests to try to answer his question.

First, I created three tables:

CREATE TABLE dbo.PersistTest (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

CREATE TABLE dbo.PersistTest2 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
CalcValue AS (RandomValue1 + RandomValue2)
);
GO

CREATE TABLE dbo.PersistTest3 (
PersistTestID INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
RandomValue1 DECIMAL(5,3),
RandomValue2 DECIMAL(5,3),
PersistValue AS (RandomValue1 + RandomValue2) PERSISTED
);
GO

The first table has two calculated columns, one PERSISTED and one not. The second table has a non-persisted column only. The third table has a persisted column only. This way, I can see the behavior across all these tables and compare them to see where PERSISTED adds to, or removes from, performance.

I took a very simple set of queries and ran these a few times on each table to get an average execution time:

INSERT dbo.PersistTest (
RandomValue1,
RandomValue2) VALUES (
42.2 , -- RandomValue1 - decimal
44.4  -- RandomValue2 - decimal
);

The results were:

Persist Persist 2 Persist 3
3580.5 1626 2260.5

On average, across about five executions, you can see a distinct difference of about 600 microseconds between Persist 2 and Persist 3, and an even larger average on the single Persist table. Let’s try the same thing with a few SELECT statements:

SELECT  pt.CalcValue,
        pt.PersistValue
FROM    dbo.PersistTest AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.CalcValue
FROM    dbo.PersistTest2 AS pt
WHERE   pt.PersistTestID = 3;
GO
SELECT  pt.PersistValue
FROM    dbo.PersistTest3 AS pt
WHERE   pt.PersistTestID = 3;
GO

While the results are spectular:

Persist Persist 2 Persist 3
115.5 123.5 109.5

You can see that making the calculation on the fly is more costly with both Persist and Persist2 being higher than Persist3, which had no calculations to retrieve data.

Now, let’s run the same thing, but, how about, 10,000 times so we can really see what the differences are over time. First, the inserts:

Persist Persist 2 Persist 3
84854 68783 73817

Still measured in microseconds, you can see that the cumulative totals are higher for the persisted columns than they are for the non-persisted column. So, there is an overhead for storing the extra information. But, what about queries?

Persist Persist 2 Persist 3
Average 221.25 222.16 100.54
Cumulative 2203431 2254238 1005469

I’d say that’s pretty conclusive. When I hit the SELECT statement 10,000 times instead just five or six, we clearly have differences in behavior. Interestingly enough, the logical reads were identical at 20,000 each. So it really did come down to the time spent calculating the values versus the time spent just retrieving them.

Caveats. This is a very simple test with simple calculations and small persisted values. Even doing something 10,000 times isn’t the same as doing it one million or one hundred million. In short, your mileage may vary, some assembly required, for queries lasting longer than four hours call your physician, it depends.

You can see that a PERSISTED column is going to negatively impact INSERT, but, it’s going to positively impact SELECT. The question is, which one is giving you pain at the moment.

UPDATE: Transposed the data in the final table between Persist 2 & Persist 3. Fixed it.

May 01 2015

Speaker of the Month: May 2015

I finally got out to some community events last month and got to see a bunch of great speakers. Not that you don’t see great speakers at paid events, you do. It’s just that they’re usually professional speakers and I want to encourage others when I can.

The Speaker of the Month for May 2015 is John Sterrett (b|t) and his presentation Automate Your Daily Checklist With PBM and CMS at SQL Saturday Boston.

The very first thing that impressed me about this presentation was how John entered the room and immediately started interacting with the attendees. You see a lot of speakers hiding behind the lectern, fiddling with cables, nervously sipping water. Not John. He was chatting as he set up, immediately getting control of the room. It’s a great way to start. Then, it was a very informative lecture. He showed how to use Policy Based Management and the Central Management Server together to create really meaningful monitoring and double checks on the servers. It’s a technique I’ve used, so I really enjoyed seeing it presented. Plus, I learned a few things. For example, I hadn’t realized you can run the Best Practices Analyzer from PBM. His slides were clear and informative. A little too much documentation for my taste, but I know some people love that type of deck, so I’m not going to ding people for it. His demos were very clear and illustrative of the points he was making. John handled questions well and liberally and carefully used Zoomit to make everything clear. It was a fantastic presentation that was very well delivered.

A couple of things I wasn’t entirely crazy about. John kept checking the slides on the projected screen. Come to find out, it was because he couldn’t see them on his laptop. OK, I forgive that. While he was pretty good at repeating the questions, a few times he forgot. It’s so easy to do when you’re speaking (guilty). Also, John didn’t have a necessary dongle for connecting his machine. Never assume you’ll have VGA/HDMI because one of them will always be wrong. But that’s it.

John’s presentation manner is a lot like talking to him. He’s real calm, looks you right in the eye, listens carefully to questions and then gives direct answers. It was a real pleasure to watch. I suggest checking out his sessions when he presents next.

Apr 27 2015

Benefits for Some, All or Only a Few

As a member of the PASS Board of Directors I attended the PASS Business Analytics Conference (BAC) recently. You can read more about it here and here (as well as here).

Let me start with an important note: I am voicing my opinion here as an individual, not an official stance of the PASS organization.

There is controversy around the BAC because of a whole bunch of things, but one question in particular bothered me. It was suggested that the people attending the BAC were just consuming the worth or value that other people who paid for the Summit generated. At first, I just dismissed this concept. It stuck in the back of my mind though. Suddenly I realized why.

Yes, the BAC was partly paid for by Summit. The attendees at the BAC were not all people who would have attended Summit. There were, maybe, 1/3, who have attended Summit, are going to attend Summit, or who might attend Summit. That means, a majority will not.

So?

Money from Summit is used to support Chapters. Anyone ever canvassed their attendees at a local user group for who has gone or will go to Summit? I have. Most of the time, far less than 1/3. Do we cut funding for Chapters?

Money from Summit is used to support SQL Saturday. Once again, I’ve canvassed several of these for people who were going to be attending Summit. Again, way less than 1/3. No more funding for SQL Saturday?

How about the Virtual Chapters that money from Summit pays for? How many of those people are attending Summit? I don’t know, but I’d be shocked if it’s 100% or anything close to that. Are we cutting Virtual Chapters?

24 Hours of PASS is also paid for by Summit.

You know, everything that PASS does, whether you like it, and attend it, or not, is paid for by Summit. There are good arguments to be made that we should not be doing the BAC (and arguments that we should). Where the money comes from is absolutely not a part of that argument. Otherwise, we must pull funding from anything and everything that is done by PASS that doesn’t translate to 100% benefits for the people who paid for it, Summit attendees.

I believe that we, the members of PASS, should be open and accepting and willing to try new things, both from a technical perspective and from a personal one. Providing training and community is what we do. Let’s focus on that.

Apr 23 2015

PASS Board Update: April 2015

It’s been a pretty interesting month on the board.

First, we did have a little problem. I’m sure one or two of you may have noticed that the SQL Saturday web site was down for a moment. Well, three days. Joking aside, this was a serious problem, but my involvement was largely peripheral since I’m in charge of Chapters. I tried to help out a little, offering what support I could and, if nothing else, supplying an ear, willing to listen. Quite a few people took advantage of that. I communicated all their information up the chain to HQ and the other board members. Nothing has been held back.

Next, we’ve started the budgeting process. That’s fascinating. As well as a giant pain in the… well, anyway. Thankfully the people at HQ are helping shepherd me through the process.

We’ve had a number of leadership changes at different PASS Chapters. A couple of new groups have been created. We’re making some progress there.

Also this month I went to the PASS Business Analytics Conference.

Let’s talk about this.

The event was really well put together. The keynotes were amazing. You can read about them here and here.  The venue was great. The speakers I saw seemed good (for the most part). We hosted a number of focus group discussions with attendees and speakers to get feedback on the event. Overall, they were extremely happy with it. They also provided a lot of great suggestions to help improve the event if we do it again.

And there is the question. Do we do this again?

I’m not sure.

If you saw my tweets during the event, I was very supportive. I was also actively promoting the event in the weeks leading up to it. All this is part of my responsibilities as a board member. We were committed to an event and I’m going to help the organization ensure that event is successful. Period.

However, if you had asked me whether I would support doing this again next year, prior to going to the event, I would have said no. Now, I’m not sure. The fact is, there’s a split between the perfect BAC audience member and the perfect Summit audience member. That’s not saying that there’s not crossover. There very much is. But, my concern is, can we, the PASS organization, properly support a community that is business focused using the tools we have for supporting a technology focused community? Should we?

From all the feedback, we supplied an event that people liked and cared about and, most importantly, would suggest to others. So we probably can support a business focused community. Should we?

For myself, and all this is just me, reporting to you, I think that the big question is, does this help our existing membership. Prior to this event, I would have said absolutely not. I would have argued that there was not a path from DBA to analyst. After talking to lots and lots of people at the event, I found that a healthy number of our peers have moved from being a DBA or BI person to being a data analyst. It goes something like “Hey! You know that data stuff, right? Well we need someone to do this data analysis thingymabob and you just volunteered.”

In short, many data analysts are accidental data analysts and they come from our peers, our community, in fact, our #sqlfamily. Now, should we do this thing? I’m on the fence still, but I don’t mind admitting, I’m wavering towards “Hell, Yes.”

Apr 21 2015

Query Hash Formula Changes Between Versions

The question came up on SQL Server Central that someone wanted to track queries on their 2008R2 instance and on their 2014 instance in order to validate performance after an upgrade. The plan was to use query hash values. I was curious, so I set up a quick test. First, I spun up two VMs in Azure, both A1, both Enterprise, but one for 2008R2 and one for 2014. Then, I grabbed a generic query that runs against DMVs:

SELECT TOP 10
SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1,
(CASE deqs.statement_end_offset
WHEN -1 THEN DATALENGTH(dest.text)
ELSE deqs.statement_end_offset
- deqs.statement_start_offset
END) / 2 + 1) AS querystatement,
deqp.query_plan,
deqs.query_hash,
deqs.execution_count,
deqs.last_elapsed_time,
deqs.last_logical_reads,
deqs.last_logical_writes,
deqs.last_worker_time,
deqs.max_elapsed_time,
deqs.max_logical_reads,
deqs.max_logical_writes,
deqs.max_worker_time,
deqs.total_elapsed_time,
deqs.total_logical_reads,
deqs.total_logical_writes,
deqs.total_worker_time
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
ORDER BY deqs.total_elapsed_time DESC;

I ran the query and captured execution plans on both servers. No changes, just defaults. The output is here:

0x40B8F425DDC3D692 –2014
0x958B3949E630C004 –2008R2

I don’t have access to the code at Microsoft, but I think it’s pretty safe to say that the formula for creating the hash value for a given query has changed between versions. This isn’t surprising, but it is a little disappointing. It would have meant an easy way to compare performance between queries on two different systems. Now, you’ll have to compare based on the T-SQL text which just isn’t as easy and clean.

Apr 16 2015

Azure SQL Database v12 and SQL Magazine

I spend many of my evenings researching and writing. Sometimes it’s writing new books. Sometimes it’s fixing and rewriting old books. Occasionally it’s a blog post like this one. Lately, it’s been a series of articles for SQL Magazine that are all about the new functionality available in Azure SQL Database v12 that was released in February for most data centers in Azure. It’s a whole new ball game. Check out my introductory article for v12 and the one on DMVs in Azure. I have more coming up on CLR in Azure, getting started, PowerShell, DBCC, T-SQL enhancements, Premium Tier and more. I’ll also get into “old” functionality like backup and restore. I’ll also explore new functionality, Azure is nothing if not dynamic, as becomes widely available.

I know a lot of you have been putting off exploring Azure, especially Azure SQL Database until it was “ready.” Guess what? It’s that time. Feel free to lean on me here, and over at SQL Mag, for questions, suggestions, thoughts, concerns, anything you have about Azure.