How to Tell Your Windows Azure SQL Database Moved

Home / Azure / How to Tell Your Windows Azure SQL Database Moved

The very concept of the Windows Azure SQL Database (WASD) is predicated on the up-time created by having three active copies of your database. Should there be a hardware or software failure that would cause the primary replica to go down, your database gets moved to one of the secondary replicas which has been maintaining a copy of the data from the primary replica. In theory, the worst thing that happens is that you have to retry a query. In fact, building that sort of resilience into your code is a pretty fundamental aspect of working with WASD.

I was asked the question, how do you tell if your database has been moved between replicas. I didn’t have a complete answer, so I set out to find one. The first, more obvious part of the answer, is that you’ll receive a connection error to your app while this event is occurring. Microsoft has a very solid white paper on how to deal with your connections, Windows Azure SQL Database Connection Management. Scrolling down you can see that you’ll receive a very specific error number depending on why you lost your connection. These errors include 40197, which is the number you’ll receive during any of the events that trigger a failover to the secondary replicas. Answer done, right? Well, not exactly.

I knew about the error code, but the real question is, how do you tell if the move occurred while you weren’t connected? Is there a dynamic management object or system view that shows this information? The first thing I thought of is sys.event_log. That’s because it stores aggregated information about connectivity and errors such as deadlocks and throttling. Surely, if you were moved it would be there. But, reading through the documentation there’s no indication of it. And, it looks like that information is stored within the database and is therefore going to be moved (unlike all the information in cache which just goes away).

Speaking of cache, you could try running queries against sys.dm_exec_query_stats to see when the oldest compile time of any given query might be. But, remember, we’re on a shared server with a hundred other databases. Cache is highly volatile. I don’t think I’ve ever seen more than about one day’s worth of information within the cache dependent set of DMOs. In fact, I seldom see more than about an hours worth there and sometimes quite a bit less. So that’s not a possibility unless you set up a mechanism for monitoring your servers and can see a massive delta in what’s in cache between one look at the query stats to the next. But that could still be cache volatility.

Looking through the rest of the DMOs and system views I couldn’t identify anything that definitively answers this question. So, at this point, I’d say, if you don’t get the error, there is no real indication that you’ve moved from a primary to a replica. On the one hand, it concerns me that I don’t know. On the other, who cares. The whole idea behind this approach to development is that the guts of the things are handled by others allowing you to concentrate on building the database and the app code appropriately.

For lots more about Windows Azure SQL Database, sign up for the pre-conference seminar, How to be a Successful DBA in the Changing World of Cloud and On-Premise Data, at Tech Ed 2013 in New Orleans or Tech Ed Europe in Madrid.

 

8 Comments

OK, fine, but what do you think?