From SCD2 Bronze to a Non-SCD Silver Layer in Other Tech (Iceberg, Hudi, BigQuery, Fabric)

Modern data platforms consistently separate historical truth from analytical usability by storing full SCD2 history in a Bronze layer and exposing a simplified, current-state Silver layer. Whether using Apache Iceberg, Apache Hudi, Google BigQuery, or Microsoft Fabric, the same pattern applies: Bronze preserves immutable, auditable change history, while Silver removes temporal complexity to deliver one row per business entity. Each platform implements this differently, via snapshots, incremental queries, QUALIFY, or Delta MERGE, but the architectural principle remains universal and essential for regulated environments.

Contents

Introduction: How Iceberg, Hudi, BigQuery, and Fabric Transform Historical Truth into Clean, Analytical Data

Modern regulated data platforms must balance two competing needs: the ability to reconstruct the past exactly as it was, and the ability to analyse the present without unnecessary complexity. Technologies such as Apache Iceberg, Apache Hudi, Google BigQuery, and Microsoft Fabric approach this balance in different ways, but all converge on the same architectural outcome. This introduction frames why separating historical truth from analytical usability is foundational to modern lakehouse design, and why the Bronze-to-Silver pattern continues to emerge regardless of platform choice.

Databricks and Snowflake are not the only lakehouse ecosystems used in modern enterprise architectures. A rapidly growing number of Financial Services institutions are adopting Apache Iceberg, Apache Hudi, Google BigQuery, and Microsoft Fabric/Synapse as core components of their data platforms.

Each of these technologies supports a slightly different approach to building:

  • Bronze layer = full SCD2 history
  • Silver layer = simplified, non-SCD current state

But the end goal is the same:

Preserve every historical change for audit and compliance while delivering clean, simple, current-state data for analytics and AI.

This article explains how each of these platforms supports the transition from a historical SCD2 Bronze layer to a non-SCD Silver layer—and what patterns work best.

This is part 7 of a related series of articles on using SCD2 at the bronze layer of a medallion-based data platform for highly regulated Financial Services (such as the UK).

Earlier articles in this series explored Databrick’s and Snowflake’s implementation in depth. This article demonstrates that the same architectural pattern holds across open table formats and managed analytical platforms.

1. Why the SCD2 Bronze to Non-SCD Silver Pattern Exists in All Modern Platforms

The persistence of the SCD2 Bronze to non-SCD Silver pattern is not driven by tooling trends, but by structural requirements common to regulated data environments. Auditability, lineage, and temporal correctness demand that history be preserved in full, while analytics, reporting, and machine learning demand simplicity and stability. This section explains why these opposing needs cannot be met in a single layer, and why every serious data platform, implicitly or explicitly, separates them.

Regardless of technology, the underlying principles remain constant:

Bronze (SCD2)

  • Stores full temporal history
  • Preserves regulatory lineage
  • Enables point-in-time reconstruction
  • Acts as an immutable source of truth
  • Supports backfills, CDC, audit, and forensic analysis

Silver (Non-SCD)

  • Removes time-based complexity
  • Presents one row per business entity
  • Simplifies queries and joins
  • Provides clean, curated data for analytics
  • Avoids versioning and effective-dating logic

The Bronze layer exists for truth.
The Silver layer exists for usefulness.

This is true in Iceberg, Hudi, BigQuery, Fabric, Synapse, everywhere.

1.1 Design Alternatives

Several platforms can expose current-state data directly from Bronze tables using views or semantic layers. Examples include Iceberg views filtered on current_flag, BigQuery materialized views, and Fabric Direct Lake semantic models. While useful in some scenarios, regulated environments typically prefer materialised Silver tables to avoid unpredictable query cost, unclear ownership boundaries, and accidental coupling of analytical workloads to historical logic. I have called these out below.

1.2 Regulatory Note

Regulators do not require analytical workloads to query historical truth directly, only that historical truth is preserved, traceable, and reproducible. The Bronze/Silver separation satisfies this requirement by isolating immutable history from analytical consumption, while still enabling full auditability and point-in-time reconstruction.

1.3 Operationalising Bronze to Silver Across Platforms

Across all platforms, Bronze-to-Silver pipelines are driven incrementally rather than rebuilt in full. The triggering mechanism differs—Iceberg uses snapshot or timestamp-based scans, Hudi advances via commit timelines, BigQuery relies on partition filtering and scheduled execution, and Fabric tracks Delta versions through pipelines or Dataflows—but the intent is the same. Silver is refreshed only when Bronze changes, and only for affected entities. Operationally, this keeps compute predictable, limits blast radius, and ensures Silver remains a stable analytical contract. Monitoring focuses on pipeline completeness and freshness rather than row-level mutation. In practice, Silver becomes the operational boundary where data is considered analytics-ready.

PlatformIncremental DriverTypical Orchestration
IcebergSnapshot / commit timestampAirflow, dbt
HudiCommit timelineSpark, Flink
BigQueryPartition filtersScheduled Queries, dbt
FabricDelta versioningPipelines, Dataflows

In practice, dbt is commonly used with Iceberg and BigQuery to express Silver transformations declaratively, while external orchestrators handle scheduling and dependency management.

2. Moving from SCD2 Bronze to Silver in Apache Iceberg

Apache Iceberg introduces table-format capabilities that make historical data easier to manage without sacrificing analytical performance. Its snapshot-based design, metadata-driven planning, and incremental scanning change how engineers think about movement between layers. This section sets the scene for how Iceberg’s core design choices naturally support the transition from immutable SCD2 history to a clean, current-state Silver representation.

Apache Iceberg is becoming the default open table format in many lakehouse platforms, including AWS EMR, Snowflake Polaris, Dremio, Starburst, and BigQuery’s new Iceberg-native features.

Key Iceberg Capabilities for Bronze → Silver

  • Metadata-rich snapshots
  • Partition evolution (change partitioning over time)
  • Hidden partitioning (users don’t need to understand partition columns)
  • Position deletes and equality deletes
  • Time travel
  • Incremental scanning (“scan only data changed since snapshot X”)

These features make Iceberg surprisingly well-suited for SCD2 pipelines.

2.1 Step 1: Use Incremental Scans Instead of Full Table Reads

Iceberg supports incremental reads based on commit snapshots:

SELECT * 
FROM bronze_scd2
FOR SYSTEM_TIME AS OF SNAPSHOT <snapshot-id>

Or incrementally:

SELECT * 
FROM bronze_scd2
WHERE _change_type IN ('INSERT', 'UPDATE')
  AND _change_timestamp > last_processed_time

This lets Silver pipelines process only new changes, not billions of rows.

2.2 Step 2: Filter for Current Rows Only

Iceberg SCD2 tables typically store:

  • effective_from
  • effective_to
  • current_flag

So Silver logic is:

INSERT OVERWRITE silver_customers
SELECT customer_id, name, status
FROM bronze_scd2
WHERE current_flag = true;

Or:

WHERE effective_to = '9999-12-31';

2.3 Step 3: Write a Clean, Non-SCD Silver Table

Iceberg supports schema evolution and partition evolution, making it easy to maintain Silver tables over time.

Silver tables should:

  • Drop SCD2 metadata columns
  • Keep only current attributes
  • Be partitioned by business keys where appropriate
  • Be materialised using SQL engines (Spark, Trino, Flink, Snowflake, BigQuery)

Example: Materialising the Current-State Silver Table

INSERT OVERWRITE silver_customers
SELECT customer_id, name, status
FROM bronze_scd2
WHERE current_flag = true;

2.4 Step 4: Use Equality Deletes for Deduplication

Iceberg equality deletes allow you to remove old “current” rows efficiently.

Example:

DELETE FROM silver_customers
WHERE customer_id IN (SELECT customer_id FROM changes);

2.5 Why Iceberg Works Well for Bronze to Silver

  • Efficient incremental scanning
  • Flexible schema evolution
  • Extremely fast point-lookups (with hidden partitioning)
  • Strong durability guarantees

Iceberg is one of the best open-source formats for SCD2 → Silver workflows.

Alternatives such as views filtered on current_flag can expose current-state data directly from Bronze tables. In regulated environments, however, materialised Silver tables are preferred to avoid accidental coupling of analytical workloads to historical logic and snapshot semantics.

3. Moving from SCD2 Bronze to Silver in Apache Hudi

Unlike general-purpose table formats, Apache Hudi was designed with change data capture and upserts as first-class concerns. As a result, Bronze-to-Silver flows in Hudi feel less like transformations and more like controlled materialisations of state. This section provides context on how Hudi’s CDC-centric architecture shapes its approach to deriving current-state data from historical records.

Hudi was designed explicitly for CDC and upsert-heavy workloads, making it naturally aligned with SCD2.

Hudi offers two storage modes:

  • Copy On Write (CoW) → great for SCD2 Bronze
  • Merge On Read (MoR) → great for real-time views

3.1 Step 1: Use Hudi Incremental Queries for Bronze

Hudi supports incremental pull:

SELECT *
FROM bronze_scd2
WHERE _hoodie_commit_time > last_commit

This is ideal for driving incremental Silver refreshes.

3.2 Step 2: Identify Current Records via Precombine + Op Column

Hudi uses fields like:

  • _hoodie_is_deleted
  • _hoodie_commit_time
  • _hoodie_operation

Silver logic:

SELECT *
FROM bronze_scd2
WHERE _hoodie_is_deleted = false
  AND is_current = true;

3.3 Step 3: Materialise the Silver Table Optimally

With Hudi’s upsert-heavy design, you can maintain a clean Silver table using:

  • Spark structured streaming
  • Flink
  • Presto/Trino
  • Dremio

Silver tables should be:

  • Lightweight
  • Stable
  • Schema-normalised
  • Partitioned by business key if applicable

Example: Deriving the Current Silver View

SELECT customer_id, name, status
FROM bronze_scd2
WHERE _hoodie_is_deleted = false
AND is_current = true;

3.4 Why Hudi Works Well for Bronze to Silver

  • Built for CDC-heavy workloads
  • Upserts are cheap and fast
  • Great for near-real-time Silver tables
  • Strong integration with Spark, Flink, and Data Streams

Hudi real-time views can expose the latest state directly from Bronze storage. While useful for low-latency use cases, regulated analytical environments typically prefer a materialised Silver table to provide a stable, query-optimised contract.

4. Moving from SCD2 Bronze to Silver in Google BigQuery

BigQuery operates without an external table format layer, yet it still supports the same architectural separation through its SQL engine and storage optimisations. Rather than relying on metadata-driven snapshots or incremental files, BigQuery leans on query-time semantics and columnar performance. This section frames how BigQuery achieves the Bronze-to-Silver outcome using a fundamentally different execution model.

BigQuery is not a table-format engine like Iceberg or Hudi.
Instead, it uses an append-only columnar store where MERGE operations rewrite microblocks.

But BigQuery has excellent features for SCD2 → Silver:

  • QUALIFY for window functions
  • Partition + clustering
  • Storage-level time travel
  • Partition pruning
  • Automatic columnar compression
  • Extremely fast SQL operations

4.1 Step 1: Use MERGE + Partitioning for SCD2 Bronze

Bronze SCD2 uses typical pattern:

MERGE INTO bronze_scd2 ...

Partitioned by:

  • effective_from (DATE)

Clustered by:

  • business_key

4.2 Step 2: Build the Silver Table Using QUALIFY

BigQuery’s QUALIFY makes deduping trivial:

SELECT * 
FROM bronze_scd2
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY effective_from DESC) = 1;

This instantly produces the current record only.

4.3 Step 3: Materialise as a Silver Table

Silver tables should:

  • drop SCD2 columns
  • flatten and standardise schema
  • be reclustered occasionally to maintain performance
  • be materialised as a table, not just a view

BigQuery views on large SCD2 tables can be expensive at query time.

Example: Creating a Materialised Silver Table

CREATE OR REPLACE TABLE silver_customers AS
SELECT customer_id, name, status
FROM bronze_scd2
QUALIFY ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY effective_from DESC
) = 1;

4.4 Why BigQuery Works Well for Bronze → Silver

  • QUALIFY + window functions = easy deduplication
  • Super-fast scan performance
  • Automatic compression
  • Easy incremental refresh using partition filters
  • No need for Streams/Tasks because BigQuery query engine is so fast

BigQuery materialized views can present current-state records without a separate Silver table. In practice, materialised Silver tables are often favoured to control cost, ensure predictable performance, and avoid repeated window-function evaluation at query time.

5. Moving from SCD2 Bronze to Silver in Microsoft Fabric / Synapse

Microsoft Fabric and Synapse blend lakehouse and warehouse concepts under a unified governance and analytics experience. While Delta Lake underpins storage, much of the Silver transformation work is expressed through higher-level tools such as SQL endpoints, pipelines, and dataflows. This section sets the context for how Bronze-to-Silver patterns are operationalised within Microsoft’s integrated analytics ecosystem.

Fabric and Synapse both use Delta Lake under the hood (in OneLake), but with Microsoft-specific tooling layered on top.

Fabric Advantages

  • Lakehouse and Warehouse engines both available
  • Delta format storage
  • Pipelines integrated into Power BI and Dataflows
  • Strong MDM and governance tooling

5.1 Step 1: Bronze SCD2 Using Delta MERGE

Identical to Databricks SCD2 pattern.

Have a look at the preceding article: “From SCD2 Bronze to a Non-SCD Silver Layer in Databricks“.

5.2 Step 2: Silver Using Lakehouse SQL or Dataflows

Silver can be generated using:

  • SQL in Lakehouse
  • Dataflows Gen2
  • Notebooks
  • Pipelines

Filtering current rows:

SELECT *
FROM bronze_scd2
WHERE IsCurrent = 1;

5.3 Step 3: Materialise Silver as a Delta Table or Warehouse Table

Fabric’s Warehouse layer provides extremely fast query performance.
Silver tables should:

  • remove historical metadata
  • standardise schema
  • expose stable domain structures

Power BI models should always use Silver, never Bronze.

Example: Materialising Silver in the Lakehouse

CREATE OR REPLACE TABLE silver_customers AS
SELECT customer_id, name, status
FROM bronze_scd2
WHERE IsCurrent = 1;

5.4 Why Fabric Works Well for Bronze to Silver

  • Delta format underpins the entire OneLake
  • Easy integration with Power BI and analytical reporting
  • Dataflows can automate Silver transformations visually
  • Good for enterprises already in the Microsoft stack

Direct Lake semantic models can surface current-state data directly from Bronze tables. For enterprise reporting and governance, however, explicit Silver tables provide clearer ownership boundaries and reduce semantic-layer complexity.

6. Summary: Bronze Holds Truth, Silver Serves Insight – Across All Platforms

Across technologies, vendors, and execution models, the same architectural truth emerges: historical accuracy and analytical usability are different problems requiring different representations. This concluding section reframes the platform-specific examples into a single, technology-agnostic principle—one that explains why the Bronze-to-Silver pattern continues to anchor modern data architectures, particularly in highly regulated industries.

Whether you’re using Iceberg, Hudi, BigQuery, or Microsoft Fabric, the core principles are universal:

  • Bronze Layer
    • SCD2 history
    • Time-travel lineage
    • Regulatory auditability
  • Silver Layer
    • One row per entity
    • Clean, curated, simplified data
    • Built using incremental Bronze changes
    • Optimised for analytics, ML, and reporting

Each platform provides different mechanisms: Iceberg snapshots, Hudi incremental queries, BigQuery QUALIFY, Fabric Delta Lake, but the architectural pattern remains consistent and effective.

This SCD2 Bronze to Non-SCD Silver pattern is the backbone of modern data engineering, especially in regulated environments like UK Financial Services.

Databricks and Snowflake are not the exception; they are both expressions of a universal Bronze-to-Silver design.