From 20-year-old warehouse SCDs to a modern temporal backbone you can trust. This article lays out a practical, regulator-aware playbook for migrating legacy EDW SCD dimensions to a modern SCD2 Bronze layer in a medallion/lakehouse architecture. It covers what you are really migrating (semantics, not just tables), how to treat the EDW as a source system, how to build canonical SCD2 Bronze, how to run both platforms in parallel, and how to prove to auditors and regulators that nothing has been lost or corrupted in the process.
Contents
- Contents
- 1. Introduction: Why EDW to Lakehouse SCD Migration Is Hard
- 2. What You’re Actually Migrating (Hint: Not Just Tables)
- 3. Core Principles for a Safe SCD Migration
- 4. Migration Patterns: How to Treat the Legacy EDW
- 5. The Step-by-Step Playbook
- 5.1 Inventory and Classification — understand what you actually depend on
- 5.2 Decide the Target Bronze Model — define ‘good’ before you build
- 5.3 Build Raw/Base Feeds — observe before replacing
- 5.4 Construct Canonical SCD2 Bronze — make history explicit
- 5.5 Parallel Run & Reconciliation — prove equivalence
- 5.6 Decommissioning & Archival — exit safely
- 6. Common Pitfalls and Anti-Patterns
- 7. Regulatory and Risk Considerations
- 8. Example: Migrating a Customer Dimension
- 9. Summary: From Legacy SCDs to a Future-Proof Bronze
1. Introduction: Why EDW to Lakehouse SCD Migration Is Hard
Most UK Financial Services organisations sit on 15–25 years of Slowly Changing Dimension (SCD) logic embedded in Teradata, Oracle, DB2, or SQL Server enterprise data warehouses. Those dimensions underpin regulatory reporting, risk models, MI packs, and remediation work that has run for a decade or more.
As lakehouse platforms mature, the question is no longer “should we move?” but “how do we move without breaking everything?”
For many UK banks, insurers, and payment firms, the core dimensions that run the business live in a legacy EDW:
- CUSTOMER / PARTY
- ACCOUNT
- PRODUCT / INSTRUMENT
- ORGANISATION / COST CENTRE
- CONTRACT / POLICY / AGREEMENT
They have:
- deep history (10–20+ years)
- embedded SCD logic (often hand-crafted and opaque)
- dozens of downstream dependencies (risk, finance, MI, AML, conduct, pricing)
- regulatory reliance (regulators assume these are the “truth layers”)
Simply “lifting and shifting” those tables into a lakehouse is tempting and usually wrong.
You are not just moving data. You are:
- moving temporal semantics,
- moving business meaning,
- moving regulator-relied-upon logic,
- and changing the system of record for history.
This needs to be approached deliberately, with clear phases, strong reconciliation, and a realistic view of coexistence.
This article aims to offer the step-by-step playbook I’ve seen work in practice in UK FS, rather than a hand-wavy “just rebuild it in Spark” answer.
Part of the “land it early, manage it early” series on SCD2-driven Bronze architectures for regulated Financial Services. Playbook for EDW SCD to lakehouse Bronze migration, for migration teams, architects, and legacy owners who need to preserve history during transition. This article gives steps to modernise without losing regulatory fidelity.
2. What You’re Actually Migrating (Hint: Not Just Tables)
It’s important to be honest about what a 20-year-old EDW dimension really contains.
You’re migrating at least four things:
- Data
- The physical rows and columns.
- SCD history, surrogate keys, effective dates, flags, and attributes.
- Temporal Logic
- How SCD1 vs SCD2 vs SCD0 is applied.
- How changes are detected (field-by-field, hashes, triggers, etc.).
- How overlapping or missing periods are handled.
- Business Semantics
- What “customer” means.
- How legal name vs known-as vs trading-as is handled.
- How householding, relationships, and party links are implemented.
- Embedded golden-source and precedence rules (even if never explicitly documented).
- Regulatory Trust
- The fact that PRA/FCA have implicitly or explicitly accepted these structures in past reviews.
- The lineage assumptions in existing models and reports.
- The “we’ve used this dimension for 10 years and no-one has died yet” factor.
A credible lakehouse migration must respect all four.
If you only move the data and ignore the rest, you will get something that looks superficially similar but behaves differently under pressure — especially in point-in-time (PIT) analysis and remediation programmes.
3. Core Principles for a Safe SCD Migration
A few principles help keep you honest:
- Treat the legacy EDW as a source system, not as the new Bronze.
The lakehouse Bronze layer should be your long-term temporal backbone, not a mirror of a warehouse that is already being retired. - Do not try to be clever on day one.
The first goal is equivalence — “the new platform reproduces the old results” — not elegance. - Separate reconstruction from redesign.
- Phase 1: build a Bronze that can reconstruct EDW behaviour.
- Phase 2: evolve towards attribute-level SCD2, better keys, and improved domain models.
- Make semantics explicit.
If a SCD2 CUSTOMER dimension hides precedence or golden-source logic, bring that out into metadata and configuration as you migrate. - Coexist gracefully.
Expect a multi-year period where EDW and lakehouse run side-by-side. Design with dual-running in mind. - Prove equivalence with numbers, not slides.
Reconciliation reports, PIT comparisons, and row-level diffs matter more than architecture diagrams.
4. Migration Patterns: How to Treat the Legacy EDW
There is no single “correct” way to position a legacy EDW during migration. Different organisations, domains, and timelines demand different compromises between speed, risk, and long-term cleanliness.
Understanding the available patterns — and their consequences — helps avoid accidental commitments that are hard to unwind later.
There are three main ways to treat the EDW in your new architecture. In practice, you’ll often use a combination.
4.1 EDW as a “Transitional Bronze”
You land EDW SCD dimensions into the lakehouse and label them as Transitional Bronze:
- they keep their existing keys and semantics;
- they feed existing reports while you build the new world;
- they are not your long-term Bronze.
Pros:
- low friction early;
- easy to stand up;
- provides a familiar anchor for teams.
Cons:
- risk of re-entrenching legacy design;
- might delay true modernisation.
4.2 EDW as Just Another Source System
You treat the EDW as one source among many feeding the canonical Bronze:
- EDW outputs are ingested into Raw/Base.
- Bronze SCD2 tables are built from upstream sources and EDW extracts.
- Precedence rules decide when EDW “wins” vs core systems vs new platforms.
Pros:
- cleaner long-term posture: EDW is just a source, not the backbone.
- migrations can be done domain-by-domain.
Cons:
- requires good metadata and strong precedence modelling.
- more complex to explain initially.
4.3 Full Rebuild from Operational Sources
You largely ignore the EDW as a source of SCD logic, and instead rebuild SCD2 directly from:
- core banking
- policy admin
- CRM
- AML/KYC
- risk systems
Pros:
- cleanest design;
- best long-term fidelity to operational truth;
- avoids inheriting 20-year-old “warts”.
Cons:
- highest risk;
- requires extremely robust backfill & historical reconstruction;
- harder to argue as “equivalent” in the short term.
A pragmatic approach in UK FS is:
Use pattern 4.2 and 4.3 together:
treat the EDW as a temporary source of last resort, and gradually phase it out as upstream coverage improves.
5. The Step-by-Step Playbook
Large EDW migrations fail less often because of bad technology choices and more often because teams underestimate sequencing, evidence, and coexistence. Moving decades of SCD logic safely requires a controlled progression that preserves regulatory confidence while gradually shifting the system of record.
What follows is the practical sequence that works in UK Financial Services: starting with understanding what you actually have, moving through controlled reconstruction, and only then transitioning ownership of history. The emphasis throughout is on equivalence, evidence, and reversibility — not speed.
5.1 Inventory and Classification — understand what you actually depend on
Before any data is moved, the organisation needs an honest picture of what its EDW dimensions actually do in practice. These tables are rarely “just dimensions”; they are embedded in regulatory reporting, risk calculations, remediation logic, and long-standing business processes.
Without a clear inventory of dependencies, semantics, and criticality, migration decisions become guesswork — and guesswork is exactly what regulators punish later.
Start by being brutally clear about what you have.
For each EDW dimension (e.g. CUSTOMER, ACCOUNT, PRODUCT):
- Catalogue its role
- which reports use it;
- which regulatory returns depend on it;
- which models ingest it.
- Identify its SCD patterns
- SCD0, SCD1, SCD2 (and any SCD3-like structures);
- effective date semantics;
- surrogate key patterns;
- “current flag” usage.
- Trace its sources
- which upstream systems feed it;
- how often;
- via what mechanisms (batch, CDC, files);
- whether any logic is hard-coded in ETL.
- Assess its data quality
- overlapping periods;
- gaps;
- misaligned “current” flags;
- duplicate business keys.
- Classify it by criticality
- Tier 1: regulatory and risk critical.
- Tier 2: important but not immediately regulatory.
- Tier 3: nice-to-have.
Migrate Tier 1 dimensions with the most care and in partnership with risk/compliance.
5.2 Decide the Target Bronze Model — define ‘good’ before you build
A migration without a clear target quickly degenerates into a series of tactical fixes. Defining the target Bronze model upfront creates a stable reference point against which equivalence, gaps, and improvements can be judged.
At this stage, the goal is not perfection. It is to define a model that can faithfully express historical change, accept data from both EDW and operational sources, and support point-in-time reconstruction without hidden assumptions.
Define what “good” looks like for the new Bronze layer for each domain.
For example, a target CUSTOMER_BRONZE_SCD2 model might include:
- natural business keys (customer_id, party_id, person_id)
- surrogate keys for SCD2 versions (customer_sk)
- effective_from, effective_to, is_current
- attribute-level groupings (identity, contact, risk, AML, CRM)
- hashes per attribute group
- provenance attributes (source_system, precedence_rule, load_batch_id, lineage references)
At this stage, do not over-optimise. You want a target model that can:
- be populated from both EDW and upstream systems;
- express SCD2 history cleanly;
- support PIT reconstruction and Entity Resolution;
- evolve towards attribute-level SCD2.
Capture this as:
- a data contract;
- a model spec;
- and, ideally, tested dbt/SQL/Delta/DDL artefacts.
5.3 Build Raw/Base Feeds — observe before replacing
Once the target is clear, the next constraint is evidence. The lakehouse must be able to observe both the legacy EDW and the underlying operational systems without altering their meaning.
This phase is about creating clean, inspectable feeds that allow history to be compared, reconciled, and eventually trusted — not yet about replacing anything.
Next, make sure you can actually feed Bronze.
- From operational systems
- Land raw extracts and CDC to Raw.
- Build Base tables that normalise and lightly clean data, but don’t change meaning.
- From the EDW
- Land existing SCD dimensions into Raw/Base (e.g. nightly extracts).
- Capture EDW surrogate keys, effective dates, and flags intact.
- Tag EDW as
source_system = 'EDW'or similar.
At this point, the lakehouse is still largely observing, not yet replacing.
You’re building the plumbing that will eventually feed canonical Bronze.
5.4 Construct Canonical SCD2 Bronze — make history explicit
This is where the migration becomes real. The Bronze layer is where legacy semantics, operational truth, and regulatory expectations are brought together into a single temporal backbone.
The challenge is not to “improve” history prematurely, but to construct a Bronze representation that can reproduce EDW behaviour deterministically while creating a path to something better. Provenance, versioning, and restraint matter more here than elegance.
Now the core step: build a canonical SCD2 Bronze dimension that can be populated both from EDW and, over time, from direct upstream sources.
For a given entity (say CUSTOMER):
- Initial population (EDW-driven)
- Use the EDW SCD history as the primary input.
- Map EDW fields to the target Bronze model.
- Preserve EDW effective_from/effective_to as far as possible.
- Map EDW surrogate keys to Bronze surrogate keys (or use EDW surrogate as a seed).
- Overlay upstream source events
- For recent history, or as upstream feeds are ready, use operational CDC to “overlay” or validate EDW history.
- Where EDW history is incomplete or incorrect, treat EDW as lower-precedence than upstream sources and correct Bronze accordingly.
- Introduce attribute-level SCD2 where beneficial
- For very wide dimensions with differing attribute volatility, split into attribute groups.
- Use EDW only as a filler where native history is missing.
- Track provenance explicitly
- For every Bronze SCD2 row, include provenance fields such as:
source_system(or combination of systems)edw_surrogate_key(if applicable)precedence_rule_idis_edw_derivedboolean
- This is vital when you are challenged on “how did you derive this history?”.
- For every Bronze SCD2 row, include provenance fields such as:
The key mindset:
Bronze is your future; EDW is a stepping-stone.
Bronze must be able to live on without the EDW when the time comes.
5.5 Parallel Run & Reconciliation — prove equivalence
Trust is not declared; it is earned. Parallel running is the period where the new platform proves it can carry regulatory weight without the safety net of the old one.
This phase is where confidence is built — or lost — through hard numbers, repeatable comparisons, and transparent explanation of differences. Shortcuts here almost always surface later during reviews or remediation.
For any serious EDW migration, you must plan for dual running and hard reconciliation.
- Parallel run
- Run EDW and lakehouse in parallel for at least one full reporting cycle, preferably more for Tier 1 dimensions.
- Feed both from the same upstream sources where possible.
- Reconciliation levels
- Row count by business key and SCD version.
- Attribute-level comparison for critical fields (name, address, DOB, risk flags, status, etc.).
- PIT comparison on key dates for a sample of customers/accounts.
- Downstream metric equivalence for selected reports and model inputs.
- Tolerance and exceptions
- Agree in advance where differences are acceptable (e.g. known EDW bugs you’re fixing).
- Track exceptions in an auditable log; do not sweep them away.
- Sign-off
- Involve risk, finance, and relevant business owners in sign-off.
- Document the comparison results and keep them as part of your evidence pack for future PRA/FCA questions.
Only once this is done should the lakehouse become the official system of record for that dimension.
5.6 Decommissioning & Archival — exit safely
Turning off an EDW dimension is not an engineering milestone; it is a governance decision. By the time decommissioning is considered, the lakehouse should already be behaving as the system of record for history.
What remains is ensuring that legacy data is preserved, discoverable, and defensible for future audits, investigations, or legal requests — even after the original platform is gone.
The final stage is to:
- Freeze the EDW SCD tables (for that dimension)
- Stop feeding them once you’re confident in Bronze.
- Put them into a “read-only for audit only” state.
- Archive key EDW tables to the lakehouse
- Land final EDW copies into an archival area in Raw/Base/Bronze.
- This provides a single-platform view of history when the EDW hardware is eventually retired.
- Redirect dependencies
- Point reports, models, and interfaces at Silver/Gold views built off Bronze.
- Update documentation, data dictionaries, and runbooks accordingly.
- Remove operational dependencies on EDW
- Governance should treat EDW as a legacy store, not a live dependency.
- Eventually, it can be powered down or significantly scaled back.
EDW decommissioning must respect statutory retention, legal hold, and audit replay obligations, which is why final EDW states are typically archived into the lakehouse rather than deleted.
6. Common Pitfalls and Anti-Patterns
Most failed migrations fail in familiar ways. The same shortcuts, assumptions, and pressures show up repeatedly across organisations, regardless of technology stack.
Calling these out explicitly is less about criticism and more about avoiding problems that are expensive, slow, and highly visible once regulators are involved.
A few patterns that almost always cause pain:
6.1 “Lift and Shift the EDW and Call It Bronze”
Simply copying EDW dimensions into a lake without re-anchoring them to Raw/Base/sources delays the hard work and creates a second copy of the same old design.
Use EDW as input, not as the final design.
6.2 Ignoring Precedence and Entity Resolution
Many EDW SCDs implicitly encode which system “wins” for each attribute and how entities are resolved. During migration, these rules must be made explicit and versioned over time.
If precedence or entity resolution logic is applied retrospectively or allowed to drift, historical reconstructions will silently change — breaking PIT semantics and undermining regulatory trust.
Many EDW SCDs quietly embed:
- “which system wins for which attribute”
- assumptions about de-duping and matching
If you don’t externalise those rules into an Entity Resolution and Precedence layer, you will:
- re-implement them inconsistently in the lakehouse;
- or lose them entirely in the migration.
During migration, precedence and entity resolution rules must be versioned and time-bound, otherwise historical reconstructions will silently change as logic evolves.
6.3 Breaking PIT Semantics
A careless migration can:
- change effective_from/effective_to semantics;
- alter what counts as “current”;
- or re-compute history incorrectly when backfilled from upstream feeds.
Any of these will break:
- complaints & remediation analysis;
- model backtesting;
- regulatory look-backs (“state as known on date X”).
6.4 Over-Engineering from Day One
Trying to introduce:
- attribute-level SCD2;
- fully event-driven CDC;
- perfect entity graphs;
- and a brand new conceptual model
all at once is a recipe for never finishing.
Get to “equivalent and correct” first; then refine.
7. Regulatory and Risk Considerations
From a regulatory perspective, a SCD migration is not a neutral technical exercise. It changes how historical truth is constructed, explained, and defended.
This section frames the migration in the terms regulators actually care about: continuity, traceability, reproducibility, and governance of change.
From a PRA/FCA perspective, a SCD migration will eventually be judged on:
- Continuity of reporting
- Key metrics (exposures, risk weights, impairment measures, etc.) must remain consistent or have a well-explained reason for change.
- Traceability
- You must be able to show, for any attribute on any date:
- which SCD2 record it came from;
- which source system(s);
- which precedence rule;
- which point in the migration timeline.
- You must be able to show, for any attribute on any date:
- Reproducibility
- If a regulator asks: “Rebuild your 31 Dec 2022 view using the new platform”
- you must be able to do so deterministically, or explain any differences.
- Governance of change
- Changes to SCD logic (business rules, precedence, ER) must be:
- documented;
- controlled;
- and tested, especially where they affect regulatory outputs.
- Changes to SCD logic (business rules, precedence, ER) must be:
The migration is not “just” a technology project. It is a change to the foundations of your regulatory data story. Treat it accordingly.
8. Example: Migrating a Customer Dimension
A simplified illustration.
8.1 Legacy State
EDW DIM_CUSTOMER:
cust_sk(surrogate key)cust_id(business key)name,address,dob,segment,risk_bandeffective_from,effective_to,current_flag- Sourced from: core banking, CRM, KYC engine.
- 15 years of history, used by risk and regulatory reporting.
8.2 Target Bronze
BRONZE_CUSTOMER_SCD2:
customer_sk(new surrogate)customer_id(business key)- attribute groups (identity, contact, risk, CRM)
effective_from,effective_to,is_currentsource_system,precedence_rule_id,edw_cust_sk,lineage_id- attribute hashes for change detection.
8.3 Migration Steps (Simplified)
- Land EDW
DIM_CUSTOMERinto Base asBASE_EDW_CUSTOMER. - Land upstream core banking and CRM extracts/CDC into their own Base tables.
- Build an initial
BRONZE_CUSTOMER_SCD2fromBASE_EDW_CUSTOMERonly, preserving SCD history. - Introduce precedence logic: where core banking/KYC disagree with EDW for recent history, allow upstream to override for new versions.
- Run EDW and Bronze in parallel for several months:
- compare PIT customer views on sample dates;
- reconcile row counts and key attributes.
- Once reconciled, switch downstream consumers to Silver views built from Bronze, then freeze EDW as legacy.
Over time, EDW’s role shrinks to “additional historical evidence”; Bronze becomes the temporal backbone.
9. Summary: From Legacy SCDs to a Future-Proof Bronze
Migrating legacy EDW Slowly Changing Dimensions to a lakehouse Bronze layer is not about:
- rewriting some ETL in Spark, or
- copying tables onto cheap storage.
It is about:
- re-anchoring history in a platform that can handle CDC, SCD2, PIT, entity resolution, and precedence;
- making implicit business rules explicit and governable;
- preserving the regulatory trust built up over a decade, while improving correctness and flexibility;
- and giving yourself a foundation where future work — Customer 360, advanced risk models, Consumer Duty look-backs, temporal ML — is actually possible.
A sensible migration:
- treats the EDW as a source system, not the final model;
- builds a canonical SCD2 Bronze anchored in Raw/Base;
- runs parallel and reconciled for as long as needed;
- gradually phases out EDW dependencies once confidence is earned;
- and leaves you with a platform where history is a first-class asset, not a fragile side-effect.
If you can say, after migration:
“We can reconstruct, with evidence, what our EDW believed on any date — and we can now do it better, across more domains, with clearer rules”
then you will have succeeded.