Tag Archives: UK FS SCD2 Bronze

The “Land It Early, Manage It Early” Series

This series sets out a practical doctrine for building data platforms in highly regulated Financial Services environments. Its core premise is simple: firms must be able to reconstruct what was known, when it was known, and why decisions were made — using preserved evidence, not hindsight.

The articles develop an architectural pattern that treats temporal fidelity as foundational. Historical state is captured early at ingestion, governed before consumption, and only then transformed for analytics, reporting, and AI. Slowly Changing Dimension Type 2 (SCD2) is used as enabling infrastructure rather than a modelling afterthought, allowing platforms to scale while remaining regulator-defensible.

Across the series, this approach is applied to real operational concerns: ingestion, identity, multi-source precedence, point-in-time reconstruction, consumption layers, AI integration, cost, security, and operating models. Taken together, the work describes what “land it early, manage it early” looks like in practice for modern regulated data platforms.

Probabilistic & Graph-Based Identity in Regulated Financial Services

This article argues that probabilistic and graph-based identity techniques are unavoidable in regulated Financial Services, but only defensible when tightly governed. Deterministic entity resolution remains the foundation, providing anchors, constraints, and auditability. Probabilistic scores and identity graphs introduce likelihood and network reasoning, not truth, and must be time-bound, versioned, and replayable. When anchored to immutable history, SCD2 discipline, and clear guardrails, these techniques enhance fraud and AML insight; without discipline, they create significant regulatory risk.

Continue reading

WTF is the Fellegi–Sunter Model? A Practical Guide to Record Matching in an Uncertain World

The Fellegi–Sunter model is the foundational probabilistic framework for record linkage… deciding whether two imperfect records refer to the same real-world entity. Rather than enforcing brittle matching rules, it treats linkage as a problem of weighing evidence under uncertainty. By modelling how fields behave for true matches versus non-matches, it produces interpretable scores and explicit decision thresholds. Despite decades of new tooling and machine learning, most modern matching systems still rest on this logic… often without acknowledging it.

Continue reading

Aligning the Data Platform to Enterprise Data & AI Strategy

This article establishes the data platform as the execution engine of Enterprise Data & AI Strategy in Financial Services. It bridges executive strategy and technical delivery by showing how layered architecture (Bronze, Silver, Gold, Platinum), embedded governance, dual promotion lifecycles (North/South and East/West), and domain-aligned operating models turn strategic pillars, architecture & quality, governance, security & privacy, process & tools, and people & culture, into repeatable, regulator-ready outcomes. The result is a platform that delivers control, velocity, semantic alignment, and safe AI enablement at scale.

Continue reading

Migrating Legacy EDW Slowly-Changing Dimensions to Lakehouse Bronze

From 20-year-old warehouse SCDs to a modern temporal backbone you can trust. This article lays out a practical, regulator-aware playbook for migrating legacy EDW SCD dimensions to a modern SCD2 Bronze layer in a medallion/lakehouse architecture. It covers what you are really migrating (semantics, not just tables), how to treat the EDW as a source system, how to build canonical SCD2 Bronze, how to run both platforms in parallel, and how to prove to auditors and regulators that nothing has been lost or corrupted in the process.

Continue reading

Enterprise Point-in-Time (PIT) Reconstruction: The Regulatory Playbook

This article sets out the definitive regulatory playbook for enterprise Point-in-Time (PIT) reconstruction in UK Financial Services. It explains why PIT is now a supervisory expectation: driven by PRA/FCA reviews, Consumer Duty, s166 investigations, AML/KYC forensics, and model risk, and makes a clear distinction between “state as known” and “state as now known”. Covering SCD2 foundations, entity resolution, precedence versioning, multi-domain alignment, temporal repair, and reproducible rebuild patterns, it shows how to construct a deterministic, explainable PIT engine that can withstand audit, replay history reliably, and defend regulatory outcomes with confidence.

Continue reading

Building Regulator-Defensible Enterprise RAG Systems (FCA/PRA/SMCR)

This article defines what regulator-defensible enterprise Retrieval-augmented generation (RAG) looks like in Financial Services (at least in 2025–2026). Rather than focusing on model quality, it frames RAG through the questions regulators actually ask: what information was used, can the answer be reproduced, who is accountable, and how risk is controlled. It sets out minimum standards for context provenance, audit-grade logging, temporal and precedence-aware retrieval, human-in-the-loop escalation, and replayability. The result is a clear distinction between RAG prototypes and enterprise systems that can survive PRA/FCA and SMCR scrutiny.

Continue reading

Temporal RAG: Retrieving “State as Known on Date X” for LLMs in Financial Services

This article explains why standard Retrieval-Augmented Generation (RAG) silently corrupts history in Financial Services by answering past questions with present-day truth. It introduces Temporal RAG: a regulator-defensible retrieval pattern that conditions every query on an explicit as_of timestamp and retrieves only from Point-in-Time (PIT) slices governed by SCD2 validity, precedence rules, and repair policies. Using concrete implementation patterns and audit reconstruction examples, it shows how to make LLM retrieval reproducible, evidential, and safe for complaints, remediation, AML, and conduct-risk use cases.

Continue reading

Integrating AI and LLMs into Regulated Financial Services Data Platforms

How AI fits into Bronze/Silver/Gold without breaking lineage, PIT, or SMCR: This article sets out a regulator-defensible approach to integrating AI and LLMs into UK Financial Services data platforms (structurally accurate for 2025/2026). It argues that AI must operate as a governed consumer and orchestrator of a temporal medallion architecture, not a parallel system. By defining four permitted integration patterns, PIT-aware RAG, controlled Bronze embeddings, anonymised fine-tuning, and agentic orchestration, it shows how to preserve lineage, point-in-time truth, and SMCR accountability while enabling practical AI use under PRA/FCA scrutiny.

Continue reading

Foundational Architecture Decisions in a Financial Services Data Platform

This article defines a comprehensive architectural doctrine for modern Financial Services data platforms, separating precursor decisions (what must be true for trust and scale) from foundational decisions (how the platform behaves under regulation, time, and organisational pressure). It explains why ingestion maximalism, streaming-first eventual consistency, transactional processing at the edge, domain-first design, and freshness as a business contract are non-negotiable in FS. Through detailed narrative and explicit anti-patterns, it shows how these decisions preserve optionality, enable regulatory defensibility, support diverse communities, and prevent the systemic failure modes that quietly undermine large-scale financial data platforms.

Continue reading

Time, Consistency, and Freshness in a Financial Services Data Platform

This article explains why time, consistency, and freshness are first-class architectural concerns in modern Financial Services data platforms. It shows how truth in FS is inherently time-qualified, why event time must be distinguished from processing time, and why eventual consistency is a requirement rather than a compromise. By mapping these concepts directly to Bronze, Silver, Gold, and Platinum layers, the article demonstrates how platforms preserve historical truth, deliver reliable current-state views, and enforce freshness as an explicit business contract rather than an accidental outcome.

Continue reading

Measuring Value in a Modern FS Data Platform: Framework for Understanding, Quantifying, and Communicating Data Value in FS

Measuring Value in a Modern FS Data Platform reframes how Financial Services organisations should evaluate data platforms. Rather than measuring pipelines, volumes, or dashboards, true value emerges from consumption, velocity, optionality, semantic alignment, and control. By landing raw data, accelerating delivery through reuse, organising around business domains, and unifying meaning in a layered Bronze–Silver–Gold–Platinum architecture, modern platforms enable faster decisions, richer analytics, regulatory confidence, and long-term adaptability. This article provides a practical, consumption-driven framework for CDOs and CIOs to quantify and communicate real data value.

Continue reading

East/West vs North/South Promotion Lifecycles: How Modern Financial Services Data Platforms Support Operational Stability and Analytical Freedom Simultaneously

This article argues that modern Financial Services (FS) data platforms must deliberately support two distinct but complementary promotion lifecycles. The well known and understood North/South lifecycle provides operational stability, governance, and regulatory safety for customer-facing and auditor-visible systems. In parallel, the East/West lifecycle enables analytical exploration, experimentation, and rapid innovation for data science and analytics teams. By mapping these lifecycles onto layered data architectures (Bronze to Platinum) and introducing clear promotion gates, FS organisations can protect operational integrity while sustaining analytical freedom and innovation.

Continue reading

Consumers of a Financial Services Data Platform: Who They Are, What They Need, and How Modern Architecture Must Support Them

This article examines who consumes a modern Financial Services data platform and why their differing needs must shape its architecture. It identifies four core consumer groups, operational systems, analytics communities, finance and reconciliation functions, and governance and regulators, alongside additional emerging consumers. By analysing how each group interacts with data, the article explains why layered architectures, dual promotion flows, and semantic alignment are essential. Ultimately, it argues that platform value is defined by consumption, not ingestion or technology choices.

Continue reading

Gold & Platinum Layer Architecture After Silver

Modern Financial Services data platforms require more than Bronze, Silver, and Gold layers to manage complexity, meaning, and governance. While Silver provides current-state truth and Gold delivers consumption-driven business meaning, neither resolves enterprise-wide semantics. This article introduces the Platinum layer as the conceptual truth layer, reconciling how different domains, systems, and analytical communities understand the same data. Together, Gold and Platinum bridge operational use, analytical insight, and long-lived domain semantics, enabling clarity, velocity, and governed understanding at scale.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Snowflake: Best Practices and Architectural Guidance

Slowly Changing Dimension Type 2 (SCD2) patterns are widely used in Snowflake-based Financial Services platforms to preserve full historical change for regulatory, analytical, and audit purposes. However, Snowflake’s architecture differs fundamentally from file-oriented lakehouse systems, requiring distinct design and operational choices. This article provides practical, production-focused guidance for operating large-scale SCD2 Bronze layers on Snowflake. It explains how to use Streams, Tasks, micro-partition behaviour, batching strategies, and cost-aware configuration to ensure predictable performance, controlled spend, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Databricks: Best Practices and Practical Guidance ready for AI Workloads

Slowly Changing Dimension Type 2 (SCD2) patterns are increasingly used in the Bronze layer of Databricks-based platforms to meet regulatory, analytical, and historical data requirements in Financial Services. However, SCD2 Bronze tables grow rapidly and can become costly, slow, and operationally fragile if not engineered carefully. This article provides practical, production-tested guidance for managing large-scale SCD2 Bronze layers on Databricks using Delta Lake. It focuses on performance, cost control, metadata health, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

Production-Grade Testing for SCD2 & Temporal Pipelines

The testing discipline that prevents regulatory failure, data corruption, and sleepless nights in Financial Services. Slowly Changing Dimension Type 2 pipelines underpin regulatory reporting, remediation, risk models, and point-in-time evidence across Financial Services — yet most are effectively untested. As data platforms adopt CDC, hybrid SCD2 patterns, and large-scale reprocessing, silent temporal defects become both more likely and harder to detect. This article sets out a production-grade testing discipline for SCD2 and temporal pipelines, focused on determinism, late data, precedence, replay, and PIT reconstruction. The goal is simple: prevent silent corruption and ensure SCD2 outputs remain defensible under regulatory scrutiny.

Continue reading