The 2026 UK Financial Services Lakehouse Reference Architecture

An opinionated but practical blueprint for regulated, temporal, multi-domain data platforms: focused on authority, belief, and point-in-time defensibility. This article lays out a reference architecture for UK FS in 2026: not as a rigid blueprint, but as a description of what “good” now looks like in banks, insurers, payments firms, wealth platforms, and capital markets organisations operating under FCA/PRA supervision.

Table of Contents

Contents
1. Introduction: Why a Reference Architecture Is Needed
2. Core Design Principles for a 2026 UK FS Lakehouse
3. The High-Level Logical Architecture
4. Data Layers: Raw, Base, Bronze, Entity/Precedence, Silver, Gold, Platinum
5. Cross-Cutting Capabilities
6. Workloads and Consumer Patterns
7. Implementation Approaches (Databricks, Snowflake, Others)
8. Migration and Coexistence with Legacy EDW
9. Operating the Architecture in a Regulated Environment
10. Conclusion: A New Definition of “Mature”

1. Introduction: Why a Reference Architecture Is Needed

By 2026, most UK Financial Services institutions will converge on a broadly similar lakehouse architecture: streaming and batch ingestion into Raw, SCD2-heavy Bronze as the temporal backbone, non-SCD Silver for current-state consumption, Gold for business-aligned data products, and Platinum for a conceptual/semantic layer that unifies meaning across domains.

Wrapped around this are a set of cross-cutting capabilities: event-driven CDC, entity resolution, golden-source precedence, enterprise point-in-time (PIT) reconstruction, governance and metadata, and production-grade testing for temporal pipelines.

Most UK Financial Services organisations are now on the third or fourth iteration of “modernising the data platform”:

the legacy EDW era (Teradata, Oracle, DB2)
the first-generation data lake
the early “lakehouse” experiments
and now, the regulated, temporal, multi-domain lakehouse that actually has to stand up in front of PRA/FCA.

Along the way, firms have learned some hard lessons:

You can’t bolt on auditability after the fact.
You can’t fake point-in-time reconstruction.
You can’t have five different “Customer 360s” and expect regulators to accept your numbers.
You can’t rely solely on batch in a world of streaming CDC.

The aim of this reference architecture is not to claim “the one true way”, but to:

capture the common patterns used by the most mature UK FS platforms,
show how SCD2 Bronze, non-SCD Silver, Gold, and Platinum layers fit together,
and make explicit the cross-cutting elements — entity resolution, precedence, PIT, governance, testing — that separate proof-of-concept platforms from production-grade, regulator-defensible ones.

Capstone article of the “land it early, manage it early” series on SCD2-driven Bronze architectures for regulated Financial Services. Converged blueprint for temporal FS platforms, for architects, CDOs, and engineering leads who need a mature reference model. This article gives the synthesis to implement the series doctrine at scale.

1.1 A clarification on authority, truth, and belief

This reference architecture is built on a distinction that is often left implicit in financial services data platforms:

Operational systems exercise authority.
Truth exists outside systems.
Data platforms manage institutional belief over time.

Transactional systems, CRM platforms, market feeds, and decision engines are authoritative at the moment of action or assertion. The lakehouse does not replace these systems or override their decisions. Its role is to preserve, contextualise, and explain how the institution’s belief evolved based on the evidence available at the time.

Regulators do not assess platforms based on whether they produce a single, perfectly consistent view. They assess whether firms can explain what was believed, when it was believed, why it was believed, and how that belief changed as new information arrived.

This architecture is therefore not designed to eliminate disagreement or delay. It is designed to preserve history, make authority explicit, and allow belief to be reconstructed and defended under scrutiny.

Several aspects of this architecture — including authority domains, belief reconstruction, and point-in-time defensibility — are explored in more depth elsewhere in the series; this article focuses on how they come together as a coherent system.

1.2 What “Lakehouse” Really Means in Financial Services (and What It Does Not Mean)

The term lakehouse now appears in every architecture deck, vendor pitch, and transformation programme across UK Financial Services — but it is often misunderstood. A lakehouse is not a product and it is not a synonym for “data platform.” It is an architectural pattern, one that unifies the strengths of data lakes and data warehouses into a single, governed, scalable ecosystem.

A true lakehouse combines:

Object storage using open formats (Parquet, Delta, Iceberg, Hudi)
Decoupled compute that scales elastically
Warehouse-style table behaviours (ACID, schema enforcement, governance)
Low-cost, multi-modal storage for raw, semi-structured, and structured data
Full lineage and metadata integration

Multiple vendors now implement lakehouse capabilities:

Databricks (Delta Lake)
Snowflake (Iceberg / hybrid architecture)
Google BigQuery
AWS open table formats (Iceberg/Hudi)
Azure Fabric (OneLake)

This proves the lakehouse is a pattern, not a proprietary invention.

By contrast, a data platform is significantly broader. It encompasses ingestion, modelling, governance, catalogue, SCD2 processing, PIT logic, security, operational tooling, DevOps, domain products, lineage, and consumption patterns. The lakehouse is one foundational element, not the platform itself.

In the 2026 reference architecture, the lakehouse provides the storage and compute substrate; the data platform provides the operating model, governance, metadata, and domain-centric consumption layers that make it usable, safe, and regulator-ready.

1.3 Scope and Provenance

This work was catalysed by a series of conversations with Barnaby O’Callaghan on how differing organisational units experience data platforms under regulation.

This reference architecture is not theoretical. It reflects patterns observed repeatedly across large, regulated organisations: shaped by conversations with data engineering, architecture, risk, compliance, and operational teams inside UK Financial Services, and informed by experience across government, insurance, utilities, travel, and consumer-facing platforms.

The examples, constraints, and trade-offs described here are drawn from platforms that have been challenged, corrected, and scrutinised in practice. The intent is not to describe an idealised end state, but to capture what has consistently survived regulatory review, operational pressure, and scale.

2. Core Design Principles for a 2026 UK FS Lakehouse

Before any discussion of layers, tools, or workloads, it is worth being explicit about the constraints this architecture is designed to satisfy. These principles are not aspirational values; they are hard-earned positions shaped by regulatory scrutiny, operational failure, and repeated attempts to simplify problems that turn out not to be simple. They exist to prevent architectural drift, not to decorate diagrams.

A credible UK FS lakehouse in 2026 is anchored on a few principles:

Temporal by design
Time is first-class. Bronze is explicitly temporal, and point-in-time reconstruction is a primary design goal. Historical belief is preserved exactly as it was known at the time, without overwrite.
Regulator-defensible
Every transformation can be explained. Precedence rules, entity resolution, and PIT logic are versioned, governed, and replayable.
Separation of concerns
Bronze holds full history. Silver provides current-state simplifications. Gold adds business context. Platinum abstracts conceptual meaning. Mixing these responsibilities is where platforms usually degrade. These separations are conceptual and contractual first; in mature platforms, they may not always map one-to-one to physical storage boundaries.
Multi-domain from the start
Customer, account, party, product, contract, transaction, and ledger data are modelled as interacting domains — not as isolated silos.
Event- and batch-capable
CDC, streaming, and batch are treated coherently. History is ordered in event-time, not just arrival-time. Event time, not arrival time or processing time, is the primary ordering dimension for belief reconstruction.
Entity-centric
Entity resolution (Customer / Party / Account) is anchored in Bronze as temporal belief and shared consistently across use cases.
Testable, observable, and recoverable
Temporal pipelines are tested, monitored, and repairable. Reprocessing and replay are designed in, not improvised in crisis.

3. The High-Level Logical Architecture

Once the underlying principles are clear, the next step is to understand how responsibility is distributed across the platform. This logical view is deliberately abstracted from vendors and deployment detail, focusing instead on where decisions are made, where history is preserved, and where control and consumption intersect. It provides the mental model needed to reason about failure, scale, and regulatory challenge before worrying about implementation.

If you were to draw this on a whiteboard, it would typically include four vertical “planes” and multiple horizontal “layers”:

Ingress Plane – batch files, APIs, CDC streams, XML/JSON blobs, events
Data Layers – Raw → Base → Bronze → Entity/Precedence → Silver → Gold → Platinum
Governance & Control Plane – catalogue, lineage, glossary, policy, quality, metadata, security
Consumption Plane – BI/MI, risk & actuarial models, AI/ML, operational systems, regulatory reporting, reconciliation, financial crime, analytics teams

Under the covers, this might be implemented with different technologies (Databricks, Snowflake, BigQuery, Iceberg/Hudi, Fabric, etc.), but the logical shape is broadly the same.

These planes describe responsibilities and guarantees, not mandatory tooling or physical zones; different implementations may collapse or distribute them while preserving the same invariants.

4. Data Layers: Raw, Base, Bronze, Entity/Precedence, Silver, Gold, Platinum

Layering in this architecture is not about storage hierarchy or maturity staging; it is about separating concerns that must not be collapsed without consequence. Each layer exists to enforce a specific set of guarantees, and confusion between them is one of the most common causes of regulatory and operational failure in financial services platforms.

The medallion concept remains useful, but in 2026 we can be more specific.

4.1 Raw Layer – “As Landed”

Purpose:

Capture source data exactly as received, without business logic.
Preserve blobs (XML/JSON), bytes, encodings, headers, envelopes.
Provide the last line of defence for investigations.

Characteristics:

Append-only, immutable, lineage-rich.
Minimal transformations (e.g. decryption, decompression, basic validation).
Suitable for holding CDC topics, file drops, and legacy extracts.

Raw artefacts are preserved as evidence, not as staging inputs to be discarded once parsed.

4.2 Base Layer – “Normalised Transport Structures”

Purpose:

Normalise raw payloads into queryable structures without changing business meaning.
Standardise date formats, encodings, basic types, and simple structural variations.

Characteristics:

XML/JSON parsed into structured columns and/or variant/struct fields.
Still one-to-one with source systems/tables/messages.
No SCD2 yet; no precedence; no cross-source joining.

Think of Base as source-aligned but cleaned enough to be usable by downstream SCD2 logic.

4.3 Bronze Layer – “SCD2 Temporal Backbone”

Purpose:

Hold the institution’s historical belief, as asserted and interpreted at the time
- for each domain as SCD2 tables.
Serve as the canonical repository for “what we knew, when we knew it”.

Characteristics:

SCD2 applied at row or attribute level (or both).
Effective_from/effective_to, is_current, hashes for change detection.
Event-time ordering and late/out-of-order event repair.
Handling embedded XML/JSON as hybrid: key attributes flattened; payload retained.
Optimised via hybrid patterns (delta_log + periodic MERGE, compaction, temporal windows).

Bronze is the foundation for:

PIT reconstruction
regulatory forensics
historical analytics
entity resolution
precedence and survivorship

Bronze does not claim to represent objective truth. It preserves assertions, interpretations, corrections, and enrichments exactly as they were known at the time they occurred. This distinction is critical: regulators assess belief and decision-making under uncertainty, not hindsight-corrected truth.

Bronze also makes an explicit distinction between immutable events and mutable state. Transactions, ledger entries, and other facts that occur once are recorded as events and are not treated as slowly changing dimensions. SCD2 is reserved for state, interpretation, and belief that legitimately evolve over time.

The distinction between immutable events and slowly changing state is critical in regulated environments and is examined in detail elsewhere in the series.

4.4 Entity Resolution & Precedence Layer – “Who is Who, and Which Source Wins”

Purpose:

Decide which records belong to the same entity and which source is authoritative per attribute per time.

Components:

Entity Resolution (ER) tables:
- record_id ↔ entity_id ↔ link_type ↔ link_confidence ↔ temporal versions
Precedence tables:
- attribute_group, source_system, precedence_rank, effective_from/effective_to

Characteristics:

Built on top of Bronze SCD2 history.
Uses deterministic, fuzzy, probabilistic, and graph-based matching (as appropriate).
Precedence rules versioned and governed as configuration.
Entity clusters and link decisions themselves SCD2’d.

This layer is what turns “SCD2 per system” into a coherent, time-qualified enterprise entity belief.

Entity links, survivorship decisions, and precedence rules are themselves treated as temporal data and are subject to SCD2, versioning, and audit.

4.5 Silver Layer – “Non-SCD Current-State, Consumption-Aligned Views”

Silver represents governed read models: simplified, current-state interpretations optimised for consumption, not authority.

Purpose:

Provide clean, simplified, current-state views for most consumers.
Hide SCD2 and ER complexity behind stable table contracts.

Characteristics:

One row per business entity (customer, account, product, etc.), per domain.
Uses entity_id and precedence to derive current attributes.
No effective_from/effective_to in the public contracts (internally they may exist for PIT maintenance).
May expose PIT snapshots as of standard cut points (e.g. end-of-day, end-of-month) without surfacing SCD2 mechanics.

Silver is usually where:

BI/MI teams
quants and actuaries
ML feature pipelines
operational reporting

connect first.

Disagreement between Silver and upstream authoritative systems is expected during convergence and is explainable via Bronze history and precedence logic.

4.6 Gold Layer – “Business Context and Data Products”

Purpose:

Provide business-aligned, domain-specific datasets and services.
Represent agreed definitions of KPIs, measures, and aggregates.

Examples:

Retail banking customer profitability views
IFRS9 staging and model input sets
liquidity risk cubes
operational resilience impact views
financial crime case feeds
pricing and margin dashboards

Characteristics:

Aggregations over Silver/entity-level data.
Heavier business logic.
Explicit ownership by domain teams (Data Mesh-style).
Rich documentation and acceptance criteria (often co-owned by business and data).

Gold is where most business value is realised.

Gold datasets are opinionated read models aligned to specific business decisions, metrics, or regulatory outcomes.

4.7 Platinum Layer – “Conceptual / Semantic Model”

Purpose:

Provide a platform-wide conceptual model of entities and relationships across domains.
Express “how the business sees the world”, independent of specific schemas.

Platinum is not a physical layer but a semantic contract that stabilises meaning across domains, time, and implementations.

Characteristics:

Defines entities (Customer, Account, Product, Contract, Party, Relationship, Transaction, Ledger Entry).
Defines relationships (owns, guarantees, linked_to, household_of, beneficial_owner_of, party_to).
Maps physical tables, attributes, and ER entities to conceptual constructs.
Acts as the semantic layer for:
- self-service analytics
- knowledge graphs
- cross-domain risk and conduct analysis
- model explainability

Platinum is where business meaning lives, and where cross-domain questions are answered consistently.

5. Cross-Cutting Capabilities

Some properties of the platform cannot be isolated to a single layer without becoming ineffective. These capabilities operate across the entire architecture, shaping how data moves, how decisions are derived, and how history can be defended. They are not optional enhancements; they are what make the layered model coherent under real-world pressure.

Several capabilities cut across all layers.

5.1 Event-Driven CDC Ingestion

Support for Debezium, GoldenGate, SQL CDC, Kafka/MSK, etc.
Ordering buffers, watermarks, late/out-of-order handling.
Canonical CDC events feeding Base and SCD2 Bronze.

5.2 SCD2 & Temporal Pipelines

Standard libraries/patterns for SCD2 merge logic.
Attribute-level SCD2 for wide tables.
Hybrid SCD2 optimisation (delta_log + periodic merge).
Temporal compaction windows to prevent version spam.
Temporal repair and backfill processes.

SCD2 is applied only to mutable state and interpretations, not to immutable events such as transactions.

5.3 Entity Resolution & Matching

Shared ER engine for Customer/Party/Account/Household.
Deterministic + fuzzy + probabilistic matches.
SCD2’d link tables and clusters.
Manual review pipelines where human confirmation is needed.

5.4 Golden-Source Precedence

Metadata-driven precedence matrices for attributes vs source systems.
Versioned precedence rules with effective periods.
Applied consistently in Silver, Gold, and PIT.

5.5 Enterprise PIT Engine

Engine to reconstruct “state as known” and “state as now known” on a given date.
Materialised PIT datasets for regulatory and model backtesting.
Integration with ER and precedence logic.

The PIT engine reconstructs belief as-known and belief as-now-known; it does not rewrite historical belief to reflect later corrections.

5.6 Governance, Metadata, and Lineage

Unified catalogue (e.g. Unity Catalog, Snowflake + external catalogue, Fabric/ Purview).
Technical and business lineage from Raw → Platinum.
Policy enforcement for PII/GDPR, retention, masking.
Business glossary linked to schemas, attributes, and precedence rules.

5.7 Testing, Observability, and SLOs

Temporal tests: late data, backfill, precedence changes, PIT consistency.
Data quality checks and anomaly detection.
SLIs/SLOs for data freshness, completeness, and correctness.
Operational dashboards and alerts.

6. Workloads and Consumer Patterns

Architectures ultimately succeed or fail based on how they are used. Different consumers place fundamentally different demands on data — in terms of timeliness, stability, explainability, and reproducibility — and a credible reference architecture must acknowledge those differences explicitly rather than pretending a single access pattern can satisfy all needs.

On top of this architecture, you typically see:

6.1 Analytics & BI/MI

Analysts and MI teams primarily on Silver and Gold.
Self-service reporting layers backed by semantic models (Platinum).

6.2 Risk, Finance, and Actuarial Models

Consume PIT and Gold datasets.
Require reproducible model input sets.
Depend on correct SCD2, ER, and precedence.

6.3 Financial Crime and AML/KYC

Heavy users of Bronze (for historical forensics) and Gold (for case feeds).
Require entity graphs, temporal trails, and PIT for decisions.

6.4 Fraud, Credit Decisioning, and Real-Time Scoring

Use streaming Silver (current state) and recent Bronze deltas.
Often integrate with event streams for near-real-time signals.

These workloads typically operate at the edge using authoritative systems, with the lakehouse providing historical context, enrichment, and explainability rather than synchronous control.

6.5 Regulatory Reporting and Conduct

Use a mix of Gold (reporting views) and PIT (as-known/now-known).
Require deterministic, documentable flows end-to-end.

6.6 Operational Systems and APIs

Integrate via well-defined domain services built on Silver/Gold.
Use entity_id as the spine to join across domains.

7. Implementation Approaches (Databricks, Snowflake, Others)

Although the architecture is presented in logical terms, it must survive contact with real platforms, real teams, and real constraints. This section grounds the reference model in the practical realities of today’s dominant technologies, without allowing tooling choices to redefine architectural intent.

The architecture is logical, not vendor-specific, but practical implementations tend to converge on:

7.1 Databricks / Delta Lake Focus

Delta tables for Raw/Base/Bronze/Silver/Gold.
Unity Catalog for governance and lineage.
Delta Live Tables / structured streaming for CDC and SCD2.
MLflow for model tracking and deployment.
ER and PIT engines implemented in Spark/Delta.

7.2 Snowflake-Centric

Snowflake tables for Base/Bronze/Silver/Gold.
Streams & Tasks for CDC and change processing.
VARIANT for semi-structured payloads.
External metadata/lineage tools (Atlan, Collibra, Alation, etc.) for governance.
dbt or similar for transformation management.

7.3 Hybrid / Multi-Platform

Databricks for heavy ETL and SCD2 Bronze.
Snowflake or BigQuery for analytic Silver/Gold.
Iceberg/Hudi layers for long-term archival.
Central governance over multiple engines.

The key is that regardless of tooling, the separation of concerns and temporal backbone remain the same.

8. Migration and Coexistence with Legacy EDW

Few institutions have the luxury of clean breaks. Any reference architecture that assumes wholesale replacement of legacy systems is aspirational at best and misleading at worst. This section acknowledges coexistence as the norm and treats migration as a controlled, multi-year architectural state rather than a one-off project.

By 2026, very few institutions will have fully retired their legacy EDWs. Instead, we see:

Coexistence: EDW handles stable reporting workloads; lakehouse takes on new domains and high-volume/temporal needs.
Progressive migration: SCD2 logic and critical dimensions gradually re-homed to Bronze.
Bridging layers: EDW data landed into Raw/Base/Bronze to join with new domains.

A realistic reference architecture assumes:

a multi-year journey,
dual-running for key reports,
and careful mapping of legacy SCDs to new SCD2 Bronze structures.

9. Operating the Architecture in a Regulated Environment

Designing an architecture is significantly easier than operating it under scrutiny. This section addresses what happens after go-live: when data is challenged, corrected, replayed, and questioned by people who were not involved in building the platform. Operational credibility is where architectural intent is either validated or exposed.

Finally, it’s worth stating the obvious: architecture diagrams are the easy part. Operating this in anger, under PRA/FCA scrutiny, is where it either succeeds or fails.

Operational maturity is measured not by uptime alone, but by the platform’s ability to explain its own past behaviour under challenge.

A platform that cannot preserve evidence, control mutation of data and belief, and reconstruct past handling of information under challenge is operationally insecure, regardless of perimeter controls.

Cost behaviour is treated as a governance signal: uncontrolled recomputation, replay, or duplication is considered an operational and control failure, not an efficiency concern.

Key operational realities:

Change control for ER rules, precedence, and PIT logic.
Release management for pipelines that affect regulatory outputs.
Runbooks for temporal repair, reprocessing, and backfills.
Audit trails for who changed what, when, and why.
Collaboration between data engineering, risk, compliance, and business teams.

A platform that cannot explain its own changes will struggle in any serious review.

Several of these operational implications are explored in more detail elsewhere in the series.

10. Conclusion: A New Definition of “Mature”

Rather than restating components or technologies, this closing section reframes maturity in terms that matter in regulated environments. It draws a boundary between platforms that merely function and platforms that can explain themselves — consistently, defensibly, and over time.

By 2026, a mature UK Financial Services lakehouse is not defined by:

how many nodes it runs,
which vendor is used,
or how fashionable its diagrams look.

It is defined by whether it can:

capture and preserve full temporal history correctly (SCD2 Bronze),
provide clean, stable, current-state views (non-SCD Silver),
expose business-aligned data products (Gold),
articulate a coherent conceptual model (Platinum),
resolve entities and precedence deterministically and explainably,
reconstruct point-in-time states on demand,
withstand late, out-of-order, and corrected data,
and defend all of this in front of regulators, auditors, model risk, and internal challenge.

Put simply:

In practice, this is the only question a regulated data platform must be able to answer, consistently and under scrutiny: “What did we believe, based on which authority, at any time, and how did that belief evolve?”

Everything else — tools, vendors, buzzwords — is implementation detail.

Horkan

a blog by Wayne Horkan

The 2026 UK Financial Services Lakehouse Reference Architecture

Contents

1. Introduction: Why a Reference Architecture Is Needed

1.1 A clarification on authority, truth, and belief

1.2 What “Lakehouse” Really Means in Financial Services (and What It Does Not Mean)

1.3 Scope and Provenance

2. Core Design Principles for a 2026 UK FS Lakehouse

3. The High-Level Logical Architecture

4. Data Layers: Raw, Base, Bronze, Entity/Precedence, Silver, Gold, Platinum

4.1 Raw Layer – “As Landed”

4.2 Base Layer – “Normalised Transport Structures”

4.3 Bronze Layer – “SCD2 Temporal Backbone”

4.4 Entity Resolution & Precedence Layer – “Who is Who, and Which Source Wins”

4.5 Silver Layer – “Non-SCD Current-State, Consumption-Aligned Views”

4.6 Gold Layer – “Business Context and Data Products”

4.7 Platinum Layer – “Conceptual / Semantic Model”

5. Cross-Cutting Capabilities

5.1 Event-Driven CDC Ingestion

5.2 SCD2 & Temporal Pipelines

5.3 Entity Resolution & Matching

5.4 Golden-Source Precedence

5.5 Enterprise PIT Engine

5.6 Governance, Metadata, and Lineage

5.7 Testing, Observability, and SLOs

6. Workloads and Consumer Patterns

6.1 Analytics & BI/MI

6.2 Risk, Finance, and Actuarial Models

6.3 Financial Crime and AML/KYC

6.4 Fraud, Credit Decisioning, and Real-Time Scoring

6.5 Regulatory Reporting and Conduct

6.6 Operational Systems and APIs

7. Implementation Approaches (Databricks, Snowflake, Others)

7.1 Databricks / Delta Lake Focus

7.2 Snowflake-Centric

7.3 Hybrid / Multi-Platform

8. Migration and Coexistence with Legacy EDW

9. Operating the Architecture in a Regulated Environment

10. Conclusion: A New Definition of “Mature”