Tag Archives: Delta Lake

From Partitioning to Liquid Clustering: Evolving SCD2 Bronze on Databricks at Scale

As SCD2 Bronze layers mature, even well-designed partitioning and ZORDER strategies can struggle under extreme scale, high-cardinality business keys, and evolving access patterns. This article examines why SCD2 Bronze datasets place unique pressure on static data layouts and introduces Databricks Liquid Clustering as a natural next step in their operational evolution. It explains when Liquid Clustering becomes appropriate, how it fits within regulated Financial Services environments, and how it preserves auditability while improving long-term performance and readiness for analytics and AI workloads.

Continue reading →

Managing a Rapidly Growing SCD2 Bronze Layer on Databricks: Best Practices and Practical Guidance ready for AI Workloads

Leave a Reply

Slowly Changing Dimension Type 2 (SCD2) patterns are increasingly used in the Bronze layer of Databricks-based platforms to meet regulatory, analytical, and historical data requirements in Financial Services. However, SCD2 Bronze tables grow rapidly and can become costly, slow, and operationally fragile if not engineered carefully. This article provides practical, production-tested guidance for managing large-scale SCD2 Bronze layers on Databricks using Delta Lake. It focuses on performance, cost control, metadata health, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading →

From SCD2 Bronze to a Non-SCD Silver Layer in Other Tech (Iceberg, Hudi, BigQuery, Fabric)

Leave a Reply

Modern data platforms consistently separate historical truth from analytical usability by storing full SCD2 history in a Bronze layer and exposing a simplified, current-state Silver layer. Whether using Apache Iceberg, Apache Hudi, Google BigQuery, or Microsoft Fabric, the same pattern applies: Bronze preserves immutable, auditable change history, while Silver removes temporal complexity to deliver one row per business entity. Each platform implements this differently, via snapshots, incremental queries, QUALIFY, or Delta MERGE, but the architectural principle remains universal and essential for regulated environments.

Continue reading →

From SCD2 Bronze to a Non-SCD Silver Layer in Databricks

Leave a Reply

This article explains a best-practice Databricks lakehouse pattern for transforming fully historical SCD2 Bronze data into clean, non-SCD Silver tables. Bronze preserves complete temporal truth for audit, compliance, and investigation, while Silver exposes simplified, current-state views optimised for analytics and data products. Using Delta Lake features such as MERGE, Change Data Feed, OPTIMIZE, and ZORDER, organisations, particularly in regulated Financial Services, can efficiently maintain audit-proof history while delivering fast, intuitive, consumption-ready datasets.

Continue reading →

Advanced SCD2 Optimisation Techniques for Mature Data Platforms

Leave a Reply

Advanced SCD2 optimisation techniques are essential for mature Financial Services data platforms, where historical accuracy, regulatory traceability, and scale demands exceed the limits of basic SCD2 patterns. Attribute-level SCD2 significantly reduces storage and computation by tracking changes per column rather than per row. Hybrid SCD2 pipelines, combining lightweight delta logs with periodic MERGEs into the main Bronze table, minimise write amplification and improve reliability. Hash-based and probabilistic change detection eliminate unnecessary updates and accelerate temporal comparison at scale. Together, these techniques enable high-performance, audit-grade SCD2 in platforms such as Databricks, Snowflake, BigQuery, Iceberg, and Hudi, supporting the long-term data lineage and reconstruction needs of regulated UK Financial Services institutions.

Continue reading →

Scaling the SCD2 Bronze Layer: Practical Strategies for Financial Services

Leave a Reply

A rapidly growing SCD2 Bronze layer is an expected outcome of implementing full historical tracking across financial services data platforms, where high-frequency attribute changes, noisy upstream systems, and long regulatory retention periods contribute to rapid data growth. This article outlines practical engineering strategies to keep SCD2 Bronze efficient and cost-effective, including effective partitioning, change-suppression logic, hot–warm–cold tiering, compaction and lifecycle optimisation, windowed processing, metadata offloading, and upstream data contracts. Advanced approaches, such as attribute-level SCD2, hybrid SCD2/delta-merge patterns, and hash-based change detection, further enhance scalability for mature platforms. The piece concludes that SCD2 growth is not a failure but a natural result of robust governance. With the right architecture and operational discipline, organisations can maintain an auditable, scalable, and regulator-aligned Bronze layer ready for analytics, AI, and Data Mesh.

Continue reading →

Using SCD2 in the Bronze Layer with a Non-SCD2 Silver Layer: A Modern Data Architecture Pattern for UK Financial Services

Leave a Reply

UK Financial Services firms increasingly implement SCD2 history in the Bronze layer while providing simplified, non-SCD2 current-state views in the Silver layer. This pattern preserves full historical auditability for FCA/PRA compliance and regulatory forensics, while delivering cleaner, faster, easier-to-use datasets for analytics, BI, and data science. It separates “truth” from “insight,” improves governance, supports Data Mesh models, reduces duplicated logic, and enables deterministic rebuilds across the lakehouse. In regulated UK Financial Services today, it is the only pattern I have seen that satisfies the full, real-world constraint set with no material trade-offs.

Continue reading →

Horkan

a blog by Wayne Horkan

Tag Archives: Delta Lake

From SCD2 Bronze to a Non-SCD Silver Layer in Other Tech (Iceberg, Hudi, BigQuery, Fabric)

From SCD2 Bronze to a Non-SCD Silver Layer in Databricks

Advanced SCD2 Optimisation Techniques for Mature Data Platforms

Scaling the SCD2 Bronze Layer: Practical Strategies for Financial Services

Using SCD2 in the Bronze Layer with a Non-SCD2 Silver Layer: A Modern Data Architecture Pattern for UK Financial Services