What Happens On AWS RDS?

I was talking with some developers from my team about monitoring, and I said, “We all use the same tools,” referring to other monitoring software. Then, it hit me. How is AWS collecting monitoring data on it’s RDS servers, specifically, the SQL Server instances. So, I set out to determine what happens on AWS RDS when it comes to the native monitoring.

The Setup

This part should be as obvious as it is easy. I’m going to use Extended Events. I’ve written before about how AWS RDS supports Extended Events, so I won’t repeat all that here. I’ll just leave you with the session I’m running to see what happens on AWS RDS:

CREATE EVENT SESSION [ExEventTesting]
ON SERVER
    ADD EVENT sqlserver.rpc_completed,
    ADD EVENT sqlserver.sql_batch_completed
    ADD TARGET package0.event_file
    (SET filename = N'D:\rdsdbdata\log\ExEventTesting', max_file_size = (100))
WITH
(
    STARTUP_STATE = ON
);

I don’t know if they’ll have a lot of procedure calls or if it’ll purely be batches, but I figure I can get the basics. Obviously, they have additional calls they make to gather metrics about the operating system that’s supporing AWS RDS. I’m not going to get any of that info. I’m just going to get the queries.

I let this run for 24 hours so that I’d be sure to have a nice full set of queries. I’m not doing anything else to the load.

What Happens on AWS RDS: Batches

I ended up with about 350,000 events, of which about 225,600 were procedure calls. Just to break things up, I’ll look at aggregates of both batches & procedures, starting with the batches.

The most common query, with 83,500 executions, was this (and yeah, I formatted them with SQL Prompt to make ’em easier to read):

SELECT s.session_id,
       s.database_id,
       s.login_name,
       s.host_name,
       r.sql_handle,
       r.query_hash,
       SUBSTRING(   t.text,
                    (CASE r.statement_start_offset
                         WHEN '-1' THEN
                             0
                         ELSE
                             r.statement_start_offset
                     END
                    ) / 2 + 1,
                    CASE r.statement_end_offset
                        WHEN '-1' THEN
                            0
                        ELSE
                            r.statement_end_offset
                    END
                ) AS text,
       r.wait_type
FROM sys.dm_exec_sessions s,
     sys.dm_exec_requests r
    OUTER APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE s.session_id = r.session_id
      AND s.session_id != @@SPID;

Basically returning active queries to the system. No doubt for the AWS Performance Insights pages. Now, I wasn’t running any queries, so there’s literally nothing to report since the query blanks it’s own operations with s.session_id != @@SPID. Smart. Here’s where it would have shown queries:

Another query that was called pretty frequently, 5-6 times a minute, for a total of 8,000+ occurrences was this, looking at the status of the RDS instance I assume:

USE [rdsadmin];
SELECT option_name,
       major_engine_version,
       port,
       lifecycle,
       change_state,
       install_start_epoch,
       install_end_epoch
FROM rds_option_info;

Another was called over 5,000 times, about 3 times a minute was this:

SELECT sqlserver_start_time FROM sys.dm_os_sys_info

Pretty obvious what it’s up to. I’m just surprised at how frequently that one is called.

One really interesting query, also called over 5,000 times across the period, was this:

USE [master];
SELECT * FROM sys.tables WHERE name = 'rds_database_tracking'

OK. Well, I’ve no idea why they need to validate that their table, added to master, is still in sys.tables, but there it is. Also, why are they using SELECT * and not EXISTS or something to just do a validation. So, I did what any red-blooded DBA would do with a table they didn’t recognize… Yeah, SELECT * FROM rds_database_tracking. Only to get this:

Msg 229, Level 14, State 5, Line 4
The SELECT permission was denied on the object ‘rds_database_tracking’, database ‘master’, schema ‘dbo’.

Ooops. It’s easy to forget when you’re connected up to an AWS RDS database, that it’s not an instance, but a database as a service. I don’t have access to everything, even though I’m technically ‘sa’.

Another query with 5000+ executions was this:

SELECT d.database_id AS DatabaseId,
       d.name AS Name,
       d.state_desc AS [State],
       d.user_access_desc AS UserAccess,
       d.recovery_model_desc AS RecoveryModel,
       d.create_date AS CreateDate,
       d.is_encrypted AS IsEncrypted,
       d.is_read_only AS IsReadOnly,
       d.replica_id AS ReplicaId,
       d.group_database_id AS GroupDatabaseId,
       rdt.rds_db_unique_id AS RdsDbUniqueId,
       rdt.lifecycle AS RdsLifecycle
FROM sys.databases d WITH (NOLOCK)
    LEFT OUTER JOIN master.dbo.rds_database_tracking rdt WITH (NOLOCK)
        ON d.name = rdt.database_name;

Again, pretty easy to spot what they’re doing. It’s all about tracking changes to the objects so that they can surface them within the AWS console.

There are a bunch more queries showing what happens on AWS RDS. There are queries cleaning up data stores. Queries added data to data stores. Some queries were only called once. This one was called exactly four times over the period:

CREATE TABLE #SVer
(
    ID INT,
    Name sysname,
    Internal_Value INT,
    Value NVARCHAR(512)
);
INSERT #SVer
EXEC master.dbo.xp_msver;
INSERT #SVer
SELECT t.*
FROM sys.dm_os_host_info
    CROSS APPLY
(
    VALUES
        (1001, 'host_platform', 0, host_platform),
        (1002, 'host_distribution', 0, host_distribution),
        (1003, 'host_release', 0, host_release),
        (1004, 'host_service_pack_level', 0, host_service_pack_level),
        (1005, 'host_sku', host_sku, ''),
        (1006, 'HardwareGeneration', '', ''),
        (1007, 'ServiceTier', '', ''),
        (1008, 'ReservedStorageSizeMB', '0', '0'),
        (1009, 'UsedStorageSizeMB', '0', '0')
) t (id, [name], internal_value, [value]);

-- Managed Instance-specific properties
IF (SERVERPROPERTY('EngineEdition') = 8)
BEGIN
    DECLARE @gen4memoryPerCoreMB FLOAT = 7168.0;
    DECLARE @gen5memoryPerCoreMB FLOAT = 5223.0;
    DECLARE @physicalMemory FLOAT;
    DECLARE @virtual_core_count INT;
    DECLARE @reservedStorageSize BIGINT;
    DECLARE @usedStorageSize DECIMAL(18, 2);
    DECLARE @hwGeneration NVARCHAR(128);
    DECLARE @serviceTier NVARCHAR(128);
    SET @physicalMemory =
    (
        SELECT TOP 1
               [virtual_core_count] * (CASE
                                           WHEN [hardware_generation] = 'Gen4' THEN
                                               @gen4memoryPerCoreMB
                                           WHEN [hardware_generation] = 'Gen5' THEN
                                               @gen5memoryPerCoreMB
                                           ELSE
                                               0
                                       END
                                      )
        FROM master.sys.server_resource_stats
        ORDER BY start_time DESC
    );

    IF (@physicalMemory <> 0)
    BEGIN
        UPDATE #SVer
        SET [Internal_Value] = @physicalMemory
        WHERE Name = N'PhysicalMemory';
        UPDATE #SVer
        SET [Value] = CONCAT(@physicalMemory, ' (', @physicalMemory * 1024, ')')
        WHERE Name = N'PhysicalMemory';
    END;
    UPDATE #SVer
    SET [Internal_Value] =
        (
            SELECT TOP 1
                   [virtual_core_count]
            FROM master.sys.server_resource_stats
            ORDER BY start_time DESC
        )
    WHERE Name = N'ProcessorCount';
    UPDATE #SVer
    SET [Value] = [Internal_Value]
    WHERE Name = N'ProcessorCount';
    SELECT TOP 1
           @hwGeneration = [hardware_generation],
           @serviceTier = [sku],
           @virtual_core_count = [virtual_core_count],
           @reservedStorageSize = [reserved_storage_mb],
           @usedStorageSize = [storage_space_used_mb]
    FROM master.sys.server_resource_stats
    ORDER BY [start_time] DESC;
    UPDATE #SVer
    SET [Value] = @hwGeneration
    WHERE Name = N'HardwareGeneration';
    UPDATE #SVer
    SET [Value] = @serviceTier
    WHERE Name = N'ServiceTier';
    UPDATE #SVer
    SET [Value] = @reservedStorageSize
    WHERE Name = N'ReservedStorageSizeMB';
    UPDATE #SVer
    SET [Value] = @usedStorageSize
    WHERE Name = N'UsedStorageSizeMB';
END;

SELECT
    (
        SELECT Internal_Value FROM #SVer WHERE Name = N'PhysicalMemory'
    ) AS [PhysicalMemory];
DROP TABLE #SVer;

In the end, it’s returning PhysicalMemory.

The really interesting bit was just how normal most of these queries looked, right down to the use of NOLOCK hints. Ha! Let’s look at the procedures:

What Happens On AWS RDS: Procedures

I first tried grouping on the object_name column from rpc_completed:

Well… nuts. That’s not going to tell us much beyond how they’re making their procedure calls, a combination of sp_executesql and sp_prepexec. So, I’m going to try aggregating by statement and see where I end up. However, I’m sure we can all guess.

Yep. All one & two executions because, contrary to rumor, the rpc_completed event does, indeed, contain parameter values. So we’ve got nothing but individual executions. Look, this is a blog post. I could go through the trouble of stripping all the parameter values off, but I don’t want to. I’m lazy. Let’s just look at a few examples and see what information we can gleen about what happens on AWS RDS.

They have tons of sp_prepexec (75,893), and for those who don’t know, it’s a way to parameterize a query without creating a stored procedure. You’ll get plan reuse, pretty much everything that stored procs have, just nothing created on the server. First one I looked at was disappointing:

DECLARE @p1 INT;
SET @p1 = 10993;
EXEC sp_prepexec @p1 OUTPUT,
                 N'@rds_db_unique_id uniqueidentifier',
                 N'SELECT database_name FROM master..rds_database_tracking with (nolock)
              WHERE rds_db_unique_id = @rds_db_unique_id;',
                 @rds_db_unique_id = '35B9B078-7937-457F-B8B3-0FD63C630E7A';
SELECT @p1;

It’s just pulling the database name. I think that did that 5,000 times in batches too. What would be the purpose here?

Next one I picked was slightly more interesting:

declare @p1 int
set @p1=11774
exec sp_prepexec @p1 output,N'@family_guid uniqueidentifier',N'SELECT root_cause FROM rdsadmin..db_awaiting_snapshot WHERE family_guid = @family_guid',@family_guid='EBD0FE66-F91F-43F1-99C3-B52D803BB136'
select @p1

Ah, there we see a mechanism for tracking backups within AWS RDS (guessing, I don’t work there, so what do I know).

One more:

DECLARE @p1 INT;
SET @p1 = 13965;
EXEC sp_prepexec @p1 OUTPUT,
                 N'@taskType0 nvarchar(50),@taskType1 nvarchar(50),@taskType2 nvarchar(50),@taskType3 nvarchar(50),@lifecycle0 nvarchar(50)',
                 N'select [task_id], [task_type], [lifecycle], [created_by],                     
                [created_at], [last_updated], [server_name], [database_name],                
                [database_id], [family_guid], [S3_object_arn], [overwrite_S3_backup_file],    
                [KMS_master_key_arn], [task_progress], [task_info],                          
                [filepath], [overwrite_file],[task_metadata]
            FROM [rdsadmin].[dbo].[rds_customer_tasks] WITH (NOLOCK)  
            where task_id in (
                select [task_id] from 
			        (SELECT [task_id], rank() over (partition by database_name order by task_id) as ''ranking''                         
                    FROM [rdsadmin].[dbo].[rds_customer_tasks] WITH (NOLOCK)                    
                    WHERE [lifecycle] IN (@lifecycle0) AND [task_type] IN (@taskType0, @taskType1, @taskType2, @taskType3)
                    and (''CREATED'' in (@lifecycle0) and database_name not in (select database_name from [rdsadmin].[dbo].[rds_customer_tasks] where lifecycle=''IN_PROGRESS'') 
					or ''CREATED'' not in (@lifecycle0))) RankedDatabaseName
			    where ranking=1        
                union
                SELECT [task_id]                               
                FROM [rdsadmin].[dbo].[rds_customer_tasks] WITH (NOLOCK)                 
                WHERE [lifecycle] IN (@lifecycle0) AND [task_type] IN (@taskType0, @taskType1, @taskType2, @taskType3) and [task_type] not like ''%RESTORE%''  
            )
            ORDER BY [task_id] ASC;',
                 @taskType0 = N'SSAS_DEPLOY_PROJECT',
                 @taskType1 = N'SSAS_BACKUP_DB',
                 @taskType2 = N'SSAS_RESTORE_DB',
                 @taskType3 = N'SSAS_ADD_DB_ADMIN_MEMBER',
                 @lifecycle0 = N'CREATED';
SELECT @p1;

Honestly unsure what I’m looking at here. Although the table name, rds_customer_tasks is intriguing. Still, difficult to be sure with the tables, their data and structures hidden. Let’s take a look at the sp_executesql commands:

First one I spotted:

exec sp_executesql N'SELECT QUOTENAME(@S,@C)',N'@S nvarchar(21),@C nvarchar(1)',@S=N'rds_database_tracking',@C=N''''

I’m not completely sure what this one is doing either. Looks like there’s a lot of reliance on their own internal stuff to pluck out information. Shame I can’t get a peak at that.

So, instead of just scrolling through and looking at random queries, I started doing searches. For example, what do they do with INSERT statements. That’s when I found a whole bunch of these:

EXEC sp_executesql N'INSERT INTO rdsadmin.dbo.log_backup_manifest (family_guid, rds_sequence_id, backup_round_id, file_epoch, backup_file_time, database_name, lifecycle, file_size_bytes, first_lsn, last_lsn, is_log_chain_broken, database_state, database_recovery_model)VALUES (@familyGuid, @sequenceId, @backupRoundId, @fileEpoch, @backupFileTime, @databaseName, @lifecycle, @fileSizeBytes, @firstLsn, @lastLsn, @isLogChainBroken, @databaseState, @databaseRecoveryModel)',
                   N'@familyGuid uniqueidentifier,@sequenceId int,@backupRoundId int,@fileEpoch bigint,@backupFileTime datetime,@databaseName nvarchar(4000),@lifecycle nvarchar(4000),@fileSizeBytes bigint,@firstLsn decimal(29,0),@lastLsn decimal(29,0),@isLogChainBroken int,@databaseState nvarchar(4000),@databaseRecoveryModel nvarchar(4000)',
                   @familyGuid = 'EBD0FE66-F91F-43F1-99C3-B52D803BB136',
                   @sequenceId = 302,
                   @backupRoundId = 302,
                   @fileEpoch = 1675795261,
                   @backupFileTime = NULL,
                   @databaseName = N'model',
                   @lifecycle = N'EXTRACTING',
                   @fileSizeBytes = NULL,
                   @firstLsn = NULL,
                   @lastLsn = NULL,
                   @isLogChainBroken = NULL,
                   @databaseState = N'ONLINE',
                   @databaseRecoveryModel = N'FULL';

Clearly keeping track of their backups. Not so much monitoring as management, but interesting none the less. I found over 100,000 UPDATE statements, mainly going against the rds_* tables.

There was a bunch of other cryptic stuff around those rds_* tables. Mostly, it looks like they didn’t do tons of the monitoring through procs, but rather through the batches. Still, interesting.

Conclusion

I really don’t have a profound conclusion here. I just wanted to know what happens on AWS RDS when I’m not around. Some of it is very interesting indeed. Some of it is pretty mundane, especially calls to sys.dm_exec_requests for crying out loud. Overall, I wasn’t surprised by any of it.

Next up. Azure SQL Database.

Please let me know what you think about this article or any questions:

This site uses Akismet to reduce spam. Learn how your comment data is processed.