Microsoft Fabric: Native CDC for SQL Estates and SCD Type 2 Released in Copy Job
Microsoft Fabric: Native CDC for SQL Estates and SCD Type 2 in Copy Job
Summary
Microsoft has announced the General Availability (GA) of Change Data Capture (CDC) for SQL estates in Microsoft Fabric’s Copy Job. This allows low-latency replication of inserts, updates, and deletes from Azure SQL Database, SQL Server, and Azure SQL Managed Instance into Fabric destinations. Additionally, extended Slowly Changing Dimension (SCD) Type 2 support in Copy Job (Preview) introduces native, no-code SCD Type 2 with effective dating and soft-delete handling to Fabric Data Warehouse and Synapse SQL Pool destinations.
What happened
Microsoft Fabric has released two major enhancements to the Copy Job integration tool in Data Factory:
- Native CDC for SQL Estates (Generally Available - GA): Copy Job now supports native data replication via CDC directly from SQL sources. This eliminates the need for complex, manual watermark-based logic. Supported sources include Azure SQL Database, on-premises SQL Server, and Azure SQL Managed Instance, replicating to destinations such as Fabric SQL, Lakehouses, Data Warehouses, and Snowflake.
- SCD Type 2 Support (Preview): When copying data to a Fabric Data Warehouse or Synapse SQL Pool, teams can now configure SCD Type 2 natively. The Copy Job automatically manages the logic to expire old records (e.g., setting
Valid_Todates) and insert new ones. It also handles soft deletes by marking deleted source records as inactive in the target instead of deleting them.
Why it matters
These updates address core data engineering challenges:
- Near Real-Time Synchronization: Instead of performing resource-heavy full table reloads, CDC moves data incrementally. This reduces source system strain and significantly lowers replication latency.
- No-Code History Tracking: SCD Type 2 is one of the most complex transformations in data warehousing. Built-in support in Copy Job saves data engineers hours of building and maintaining custom pipelines or stored procedures.
- Robust Delete Handling: Built-in soft-delete processing ensures target systems maintain accurate, compliant historical records.
Evidence
- Microsoft Fabric Updates Blog: Ye Xu (Principal Program Manager) detailed the benefits and setup of the Copy Job with CDC for SQL estates.
- Microsoft Learn (“What’s New in Fabric”): The official documentation lists the General Availability of the CDC feature and the public preview of SCD Type 2 capabilities.
- Community Insights: Users on the Fabric Community forum highlight the importance of avoiding manual column mapping when using the native SCD Type 2 feature to allow Fabric to auto-generate and manage tracking metadata.
Analysis
With these updates, Microsoft Fabric positions Copy Job as the primary no-code/low-code integration path for cross-cloud and cross-tenant data movement. By automating SCD Type 2, Fabric turns a historically complex PySpark notebook or T-SQL stored procedure task into a simple, declarative configuration. The GA of CDC for SQL Server and Azure SQL strengthens Fabric’s readiness for hybrid enterprise environments. It closes the gap between operational SQL engines and the analytical OneLake, minimizing time-to-value for real-time reporting and AI applications.
Practical Takeaways
- For Data Engineers: Adopt the native CDC in Copy Job for your SQL estates to modernize batch processes and decrease load on operational transactional databases.
- For Data Warehouse Architects: Evaluate the native SCD Type 2 feature for Fabric Warehouses. Remember to skip manual column mapping so Fabric can automatically create and populate historical metadata.
- For Developers: If your source database does not support CDC or requires custom transformation rules, Dataflow Gen2 and PySpark Notebooks remain the standard alternatives.
Open Questions
- What is the capacity unit (CU) consumption impact of running continuous CDC replication on highly transactional, multi-terabyte SQL estates?
- When will native SCD Type 2 support expand to Lakehouses (Delta destinations), as it currently targets Data Warehouses?
- How does the soft-delete detection scale under high-frequency delete volumes during the preview phase?