What is Change Data Capture (CDC) and how is it enabled in SQL Server
- How to create a Logon Trigger to control which users are allowed to connect to an Oracle Database - 13 January 2025
- How do we collect the actual execution plan from queries using Extended Event and how do we read its data - 2 December 2024
- How do we find what permissions a user has on an Oracle database - 1 November 2024
Change Data Capture known as CDC records that DML (insert, update, delete) action is performed on a table of a database. These changes are recorded in a corresponding table that is created. Through system functions we can find the changes made to records.
Which is used
The CDC it is very useful for ETL (Extraction, Transformation and Load) processes where we want a gradual loading of the data in a Data Warehouse.
In a Data Warehouse we are interested in depicting the changes over time in a different format, so a complete copy of the data would be delayed and would not meet the needs of the application.
However, it could also be used for necessities auditing but with the main disadvantage that it does not record SELECT which may take place.
Step by step activation with example
We have a table with few entries:
select * from customer;
To enable CDC on the database:
USE db_test GO EXEC sp_changedbowner 'sqlowner' --Προαιρετικά GO EXEC sys.sp_cdc_enable_db; GO
We enable CDC in the panel we saw before:
USE db_test GO EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'customer', @role_name = N'cdc_viewer', @filegroup_name = null, @supports_net_changes = 1 ; GO
As soon as we activate it we will see that some have been created system boards (01) in the base, two new Jobs in SQL Server Agent (02) and finally one or two functions (03) (depending on the parameter you will read below).
In the parameter @role_name
we define the role * that we want to give to users to have access to the data (of course they must also have right SELECT in the source table). If this parameter is not set then the data will be readable only by whoever has the role sysadmin or is db_owner on the basis.
*If the role does not exist, it is created automatically.
With the parameter @supports_net_changes = 1
we declare that we want the function to be created as well cdc.fn_cdc_get_net_changes. This allows us to see the records that changed in a certain time period (it returns a unique change for each change in that time period).
Let's do some DML actions (insert, update, delete) on the table:
insert into db_test..customer values ('Marios','Georgopoulos','Male','209285915'); update customer set sex = 'Female' where id = 1; delete from customer where id = 3;
To find the changes
To see the changes that have occurred in the table, Microsoft suggests the use of functions cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes that are created.
With the cdc.fn_cdc_get_all_changes we can see all the changes that have happened:
DECLARE @from_lsn binary(10), @to_lsn binary(10); SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_customer') SET @to_lsn = sys.fn_cdc_get_max_lsn(); SELECT * FROM [cdc].[fn_cdc_get_all_changes_dbo_customer](@from_lsn, @to_lsn, N' all update old'); GO
We see the changes we made before along with some extra columns with metadata. With the column $start_lsn
we are told the commit LSN that performed the transaction. With the $seqval
we can enqueue changes made in the same transaction. THE $update_mask
reports in bits the columns that changed in case of update.
In $ operation
report on the type of change made. More specifically:
- 1 = delete
- 2 = insert
- 3 = the registration before the update
- 4 = the registration after the update
But we can see with the function cdc.fn_cdc_get_net_changes that we mentioned before the changes that happened in happened in a certain period of time e.g. from the previous day to a certain time:
DECLARE @from_lsn binary(10), @to_lsn binary(10), @end_time datetime,@begin_time datetime; SET @begin_time = GETDATE()-1 SET @end_time = '2020-10-25 15:10:00.000' SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time); SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time); SELECT * FROM [cdc].[fn_cdc_get_net_changes_dbo_customer](@from_lsn, @to_lsn, N' all'); GO
We see that up to this time only the first insert of the changes we made to the table had been made.
To disable CDC on the panel
We can find which tables have CDC enabled with the field is_tracked_by_cdc
in the system view sys.tables
:
select * from sys.tables where is_tracked_by_cdc = 1
To disable it corresponding to enabling it:
USE db_test GO EXEC sys.sp_cdc_disable_table @source_schema = N'dbo', @source_name = N'customer', @capture_instance = N'dbo_customer' ; GO