Tag Archives: UK FS SCD2 Bronze

The “Land It Early, Manage It Early” Series

This series sets out a practical doctrine for building data platforms in highly regulated Financial Services environments. Its core premise is simple: firms must be able to reconstruct what was known, when it was known, and why decisions were made — using preserved evidence, not hindsight.

The articles develop an architectural pattern that treats temporal fidelity as foundational. Historical state is captured early at ingestion, governed before consumption, and only then transformed for analytics, reporting, and AI. Slowly Changing Dimension Type 2 (SCD2) is used as enabling infrastructure rather than a modelling afterthought, allowing platforms to scale while remaining regulator-defensible.

Across the series, this approach is applied to real operational concerns: ingestion, identity, multi-source precedence, point-in-time reconstruction, consumption layers, AI integration, cost, security, and operating models. Taken together, the work describes what “land it early, manage it early” looks like in practice for modern regulated data platforms.

From Writes to Reads: Applying CQRS Thinking to Regulated Data Platforms

In regulated financial environments, data duplication is often treated as a failure rather than a necessity. Command Query Responsibility Segregation (CQRS) is an approach to separate concerns such as reads versus writes. This article reframes duplication through CQRS-style thinking, arguing that separating write models (which execute actions) from read models (which explain outcomes) is essential for both safe operation and regulatory defensibility. By making authority explicit and accepting eventual consistency, institutions can act in real time while reconstructing explainable, auditable belief over time. CQRS is presented not as a framework, but as a mental model for survivable data platforms.

Continue reading

Edge Systems Are a Feature: Why OLTP, CRM, and Low-Latency Stores Must Exist

Modern data platforms often treat operational systems as legacy constraints to be eliminated. This article argues the opposite. Transactional systems, CRM platforms, and low-latency decision stores exist because some decisions must be made synchronously, locally, and with authority. These “edge systems” are not architectural debt but purpose-built domains of control. A mature data platform does not replace them or centralise authority falsely; it integrates with them honestly, preserving their decisions, context, and evolution over time.

Continue reading

Blobs as First-Class Artefacts in Regulated Data Platforms

In regulated financial services, semi-structured payloads such as XML, JSON, PDFs, and messages are not “raw data” to be discarded after parsing: they are primary evidence. This article argues that blobs must be treated as first-class artefacts: preserved intact, timestamped, queryable, and reinterpretable over time. Relational models are interpretations that evolve; original payloads anchor truth. Platforms that discard or mutate artefacts optimise for neatness today at the cost of defensibility tomorrow.

Continue reading

Why Transactions Are Events, Not Slowly Changing Dimensions

This article argues that modelling transactions as slowly changing dimensions is a fundamental category error in financial data platforms. Transactions are immutable events that occur once and do not change; what evolves is the organisation’s interpretation of them through enrichment, classification, and belief updates. Applying SCD2 logic to transactions conflates fact with interpretation, corrupts history, and undermines regulatory defensibility. By separating immutable event records from mutable interpretations, platforms become clearer, auditable, and capable of reconstructing past decisions without rewriting reality.

Continue reading

Authority, Truth, and Belief in Financial Services Data Platforms

Financial services data architectures often fail by asking the wrong question: “Which system is the system of record?” This article argues that regulated firms operate with multiple systems of authority, while truth exists outside systems altogether. What data platforms actually manage is institutional belief: what the firm believed at a given time, based on available evidence. By separating authority, truth, and belief, firms can build architectures that preserve history, explain disagreement, and withstand regulatory scrutiny through accountable, reconstructable decision-making.

Continue reading

Eventual Consistency in Regulated Financial Services Data Platforms

In regulated financial services, eventual consistency is often treated as a technical weakness to be minimised or hidden. This article argues the opposite: eventual consistency is the only honest and defensible consistency model in a multi-system, regulator-supervised institution. Regulators do not require instantaneous agreement: they require explainability, reconstructability, and reasonableness at the time decisions were made. By treating eventual consistency as an explicit architectural and regulatory contract, firms can bound inconsistency, preserve historical belief, and strengthen audit defensibility rather than undermine it.

Continue reading

Why UK Financial Services Data Platforms Must Preserve Temporal Truth for Regulatory Compliance

A Regulatory Perspective (2025–2026). UK Financial Services regulation in 2025–2026 increasingly requires firms to demonstrate not just what is true today, but what was known at the time decisions were made. Across Consumer Duty, s166 reviews, AML/KYC, model risk, and operational resilience, regulators expect deterministic reconstruction of historical belief, supported by traceable evidence. This article explains where that requirement comes from, why traditional current-state platforms fail under scrutiny, and why preserving temporal truth inevitably drives architectures that capture change over time as a foundational control, not a technical preference.

Continue reading

Common Anti-Patterns in Financial Services Data Platforms

Financial Services data platforms rarely fail because of tools, scale, or performance. They fail because architectural decisions are left implicit, applied inconsistently, or overridden under pressure. This article documents the most common and damaging failure modes observed in large-scale FS data platforms: not as edge cases, but as predictable outcomes of well-intentioned instincts applied at the wrong layer. Each pattern shows how trust erodes quietly over time, often remaining invisible until audit, remediation, or regulatory scrutiny exposes the underlying architectural fault lines.

Continue reading

Operationalising Time, Consistency, and Freshness in a Financial Services Data Platform

This article translates the temporal doctrine established in Time, Consistency, and Freshness in a Financial Services Data Platform into enforceable architectural mechanisms. It focuses not on tools or technologies, but on the structural controls required to make time, consistency, and freshness unavoidable properties of a Financial Services (FS) data platform. The objective is simple: ensure that temporal correctness does not depend on developer discipline, operational goodwill, or institutional memory, but is instead enforced mechanically by the platform itself.

Continue reading

Databricks vs Snowflake vs Fabric vs Other Tech with SCD2 Bronze: Choosing the Right Operating Model

Choosing the right platform for implementing SCD2 in the Bronze layer is not a tooling decision but an operating model decision. At scale, SCD2 Bronze forces trade-offs around change capture, merge frequency, physical layout, cost governance, and long-term analytics readiness. Different platforms optimise for different assumptions about who owns those trade-offs. This article compares Databricks, Snowflake, Microsoft Fabric, and alternative technologies through that lens, with practical guidance for Financial Services organisations designing SCD2 Bronze layers that must remain scalable, auditable, and cost-effective over time.

Continue reading

From Partitioning to Liquid Clustering: Evolving SCD2 Bronze on Databricks at Scale

As SCD2 Bronze layers mature, even well-designed partitioning and ZORDER strategies can struggle under extreme scale, high-cardinality business keys, and evolving access patterns. This article examines why SCD2 Bronze datasets place unique pressure on static data layouts and introduces Databricks Liquid Clustering as a natural next step in their operational evolution. It explains when Liquid Clustering becomes appropriate, how it fits within regulated Financial Services environments, and how it preserves auditability while improving long-term performance and readiness for analytics and AI workloads.

Continue reading

From Graph Insight to Action: Decisions, Controls & Remediation in Financial Services Platforms

This article argues that financial services platforms fail not from lack of insight, but from weak architecture between detection and action. Graph analytics and models generate signals, not decisions. Collapsing the two undermines accountability, auditability, and regulatory defensibility. By separating signals, judgements, and decisions; treating decisions as time-qualified data; governing controls as executable policy; and enabling deterministic replay for remediation, platforms can move from reactive analytics to explainable, defensible action. In regulated environments, what matters is not what was known: but what was decided, when, and why.

Continue reading

Networks, Relationships & Financial Crime Graphs on the Bronze Layer

Financial crime rarely appears in isolated records; it emerges through networks of entities, relationships, and behaviours over time. This article explains why financial crime graphs must be treated as foundational, temporal structures anchored near the Bronze layer of a regulated data platform. It explores how relationships are inferred, versioned, and governed, why “known then” versus “known now” matters, and how poorly designed graphs undermine regulatory defensibility. Done correctly, crime graphs provide explainable, rebuildable network intelligence that stands up to scrutiny years later.

Continue reading

Probabilistic & Graph-Based Identity in Regulated Financial Services

This article argues that probabilistic and graph-based identity techniques are unavoidable in regulated Financial Services, but only defensible when tightly governed. Deterministic entity resolution remains the foundation, providing anchors, constraints, and auditability. Probabilistic scores and identity graphs introduce likelihood and network reasoning, not truth, and must be time-bound, versioned, and replayable. When anchored to immutable history, SCD2 discipline, and clear guardrails, these techniques enhance fraud and AML insight; without discipline, they create significant regulatory risk.

Continue reading

WTF is the Fellegi–Sunter Model? A Practical Guide to Record Matching in an Uncertain World

The Fellegi–Sunter model is the foundational probabilistic framework for record linkage… deciding whether two imperfect records refer to the same real-world entity. Rather than enforcing brittle matching rules, it treats linkage as a problem of weighing evidence under uncertainty. By modelling how fields behave for true matches versus non-matches, it produces interpretable scores and explicit decision thresholds. Despite decades of new tooling and machine learning, most modern matching systems still rest on this logic… often without acknowledging it.

Continue reading

Aligning the Data Platform to Enterprise Data & AI Strategy

This article establishes the data platform as the execution engine of Enterprise Data & AI Strategy in Financial Services. It bridges executive strategy and technical delivery by showing how layered architecture (Bronze, Silver, Gold, Platinum), embedded governance, dual promotion lifecycles (North/South and East/West), and domain-aligned operating models turn strategic pillars, architecture & quality, governance, security & privacy, process & tools, and people & culture, into repeatable, regulator-ready outcomes. The result is a platform that delivers control, velocity, semantic alignment, and safe AI enablement at scale.

Continue reading

Migrating Legacy EDW Slowly-Changing Dimensions to Lakehouse Bronze

From 20-year-old warehouse SCDs to a modern temporal backbone you can trust. This article lays out a practical, regulator-aware playbook for migrating legacy EDW SCD dimensions to a modern SCD2 Bronze layer in a medallion/lakehouse architecture. It covers what you are really migrating (semantics, not just tables), how to treat the EDW as a source system, how to build canonical SCD2 Bronze, how to run both platforms in parallel, and how to prove to auditors and regulators that nothing has been lost or corrupted in the process.

Continue reading

Enterprise Point-in-Time (PIT) Reconstruction: The Regulatory Playbook

This article sets out the definitive regulatory playbook for enterprise Point-in-Time (PIT) reconstruction in UK Financial Services. It explains why PIT is now a supervisory expectation: driven by PRA/FCA reviews, Consumer Duty, s166 investigations, AML/KYC forensics, and model risk, and makes a clear distinction between “state as known” and “state as now known”. Covering SCD2 foundations, entity resolution, precedence versioning, multi-domain alignment, temporal repair, and reproducible rebuild patterns, it shows how to construct a deterministic, explainable PIT engine that can withstand audit, replay history reliably, and defend regulatory outcomes with confidence.

Continue reading

Building Regulator-Defensible Enterprise RAG Systems (FCA/PRA/SMCR)

This article defines what regulator-defensible enterprise Retrieval-augmented generation (RAG) looks like in Financial Services (at least in 2025–2026). Rather than focusing on model quality, it frames RAG through the questions regulators actually ask: what information was used, can the answer be reproduced, who is accountable, and how risk is controlled. It sets out minimum standards for context provenance, audit-grade logging, temporal and precedence-aware retrieval, human-in-the-loop escalation, and replayability. The result is a clear distinction between RAG prototypes and enterprise systems that can survive PRA/FCA and SMCR scrutiny.

Continue reading

Temporal RAG: Retrieving “State as Known on Date X” for LLMs in Financial Services

This article explains why standard Retrieval-Augmented Generation (RAG) silently corrupts history in Financial Services by answering past questions with present-day truth. It introduces Temporal RAG: a regulator-defensible retrieval pattern that conditions every query on an explicit as_of timestamp and retrieves only from Point-in-Time (PIT) slices governed by SCD2 validity, precedence rules, and repair policies. Using concrete implementation patterns and audit reconstruction examples, it shows how to make LLM retrieval reproducible, evidential, and safe for complaints, remediation, AML, and conduct-risk use cases.

Continue reading