Tag Archives: Data Lineage

Authority, Truth, and Belief in Financial Services Data Platforms

Financial services data architectures often fail by asking the wrong question: “Which system is the system of record?” This article argues that regulated firms operate with multiple systems of authority, while truth exists outside systems altogether. What data platforms actually manage is institutional belief: what the firm believed at a given time, based on available evidence. By separating authority, truth, and belief, firms can build architectures that preserve history, explain disagreement, and withstand regulatory scrutiny through accountable, reconstructable decision-making.

Continue reading

Why UK Financial Services Data Platforms Must Preserve Temporal Truth for Regulatory Compliance

A Regulatory Perspective (2025–2026). UK Financial Services regulation in 2025–2026 increasingly requires firms to demonstrate not just what is true today, but what was known at the time decisions were made. Across Consumer Duty, s166 reviews, AML/KYC, model risk, and operational resilience, regulators expect deterministic reconstruction of historical belief, supported by traceable evidence. This article explains where that requirement comes from, why traditional current-state platforms fail under scrutiny, and why preserving temporal truth inevitably drives architectures that capture change over time as a foundational control, not a technical preference.

Continue reading

Databricks vs Snowflake vs Fabric vs Other Tech with SCD2 Bronze: Choosing the Right Operating Model

Choosing the right platform for implementing SCD2 in the Bronze layer is not a tooling decision but an operating model decision. At scale, SCD2 Bronze forces trade-offs around change capture, merge frequency, physical layout, cost governance, and long-term analytics readiness. Different platforms optimise for different assumptions about who owns those trade-offs. This article compares Databricks, Snowflake, Microsoft Fabric, and alternative technologies through that lens, with practical guidance for Financial Services organisations designing SCD2 Bronze layers that must remain scalable, auditable, and cost-effective over time.

Continue reading

From Partitioning to Liquid Clustering: Evolving SCD2 Bronze on Databricks at Scale

As SCD2 Bronze layers mature, even well-designed partitioning and ZORDER strategies can struggle under extreme scale, high-cardinality business keys, and evolving access patterns. This article examines why SCD2 Bronze datasets place unique pressure on static data layouts and introduces Databricks Liquid Clustering as a natural next step in their operational evolution. It explains when Liquid Clustering becomes appropriate, how it fits within regulated Financial Services environments, and how it preserves auditability while improving long-term performance and readiness for analytics and AI workloads.

Continue reading

Migrating Legacy EDW Slowly-Changing Dimensions to Lakehouse Bronze

From 20-year-old warehouse SCDs to a modern temporal backbone you can trust. This article lays out a practical, regulator-aware playbook for migrating legacy EDW SCD dimensions to a modern SCD2 Bronze layer in a medallion/lakehouse architecture. It covers what you are really migrating (semantics, not just tables), how to treat the EDW as a source system, how to build canonical SCD2 Bronze, how to run both platforms in parallel, and how to prove to auditors and regulators that nothing has been lost or corrupted in the process.

Continue reading

Enterprise Point-in-Time (PIT) Reconstruction: The Regulatory Playbook

This article sets out the definitive regulatory playbook for enterprise Point-in-Time (PIT) reconstruction in UK Financial Services. It explains why PIT is now a supervisory expectation: driven by PRA/FCA reviews, Consumer Duty, s166 investigations, AML/KYC forensics, and model risk, and makes a clear distinction between “state as known” and “state as now known”. Covering SCD2 foundations, entity resolution, precedence versioning, multi-domain alignment, temporal repair, and reproducible rebuild patterns, it shows how to construct a deterministic, explainable PIT engine that can withstand audit, replay history reliably, and defend regulatory outcomes with confidence.

Continue reading

Building Regulator-Defensible Enterprise RAG Systems (FCA/PRA/SMCR)

This article defines what regulator-defensible enterprise Retrieval-augmented generation (RAG) looks like in Financial Services (at least in 2025–2026). Rather than focusing on model quality, it frames RAG through the questions regulators actually ask: what information was used, can the answer be reproduced, who is accountable, and how risk is controlled. It sets out minimum standards for context provenance, audit-grade logging, temporal and precedence-aware retrieval, human-in-the-loop escalation, and replayability. The result is a clear distinction between RAG prototypes and enterprise systems that can survive PRA/FCA and SMCR scrutiny.

Continue reading

Temporal RAG: Retrieving “State as Known on Date X” for LLMs in Financial Services

This article explains why standard Retrieval-Augmented Generation (RAG) silently corrupts history in Financial Services by answering past questions with present-day truth. It introduces Temporal RAG: a regulator-defensible retrieval pattern that conditions every query on an explicit as_of timestamp and retrieves only from Point-in-Time (PIT) slices governed by SCD2 validity, precedence rules, and repair policies. Using concrete implementation patterns and audit reconstruction examples, it shows how to make LLM retrieval reproducible, evidential, and safe for complaints, remediation, AML, and conduct-risk use cases.

Continue reading

Integrating AI and LLMs into Regulated Financial Services Data Platforms

How AI fits into Bronze/Silver/Gold without breaking lineage, PIT, or SMCR: This article sets out a regulator-defensible approach to integrating AI and LLMs into UK Financial Services data platforms (structurally accurate for 2025/2026). It argues that AI must operate as a governed consumer and orchestrator of a temporal medallion architecture, not a parallel system. By defining four permitted integration patterns, PIT-aware RAG, controlled Bronze embeddings, anonymised fine-tuning, and agentic orchestration, it shows how to preserve lineage, point-in-time truth, and SMCR accountability while enabling practical AI use under PRA/FCA scrutiny.

Continue reading

Measuring Value in a Modern FS Data Platform: Framework for Understanding, Quantifying, and Communicating Data Value in FS

Measuring Value in a Modern FS Data Platform reframes how Financial Services organisations should evaluate data platforms. Rather than measuring pipelines, volumes, or dashboards, true value emerges from consumption, velocity, optionality, semantic alignment, and control. By landing raw data, accelerating delivery through reuse, organising around business domains, and unifying meaning in a layered Bronze–Silver–Gold–Platinum architecture, modern platforms enable faster decisions, richer analytics, regulatory confidence, and long-term adaptability. This article provides a practical, consumption-driven framework for CDOs and CIOs to quantify and communicate real data value.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Snowflake: Best Practices and Architectural Guidance

Slowly Changing Dimension Type 2 (SCD2) patterns are widely used in Snowflake-based Financial Services platforms to preserve full historical change for regulatory, analytical, and audit purposes. However, Snowflake’s architecture differs fundamentally from file-oriented lakehouse systems, requiring distinct design and operational choices. This article provides practical, production-focused guidance for operating large-scale SCD2 Bronze layers on Snowflake. It explains how to use Streams, Tasks, micro-partition behaviour, batching strategies, and cost-aware configuration to ensure predictable performance, controlled spend, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Databricks: Best Practices and Practical Guidance ready for AI Workloads

Slowly Changing Dimension Type 2 (SCD2) patterns are increasingly used in the Bronze layer of Databricks-based platforms to meet regulatory, analytical, and historical data requirements in Financial Services. However, SCD2 Bronze tables grow rapidly and can become costly, slow, and operationally fragile if not engineered carefully. This article provides practical, production-tested guidance for managing large-scale SCD2 Bronze layers on Databricks using Delta Lake. It focuses on performance, cost control, metadata health, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

Production-Grade Testing for SCD2 & Temporal Pipelines

The testing discipline that prevents regulatory failure, data corruption, and sleepless nights in Financial Services. Slowly Changing Dimension Type 2 pipelines underpin regulatory reporting, remediation, risk models, and point-in-time evidence across Financial Services — yet most are effectively untested. As data platforms adopt CDC, hybrid SCD2 patterns, and large-scale reprocessing, silent temporal defects become both more likely and harder to detect. This article sets out a production-grade testing discipline for SCD2 and temporal pipelines, focused on determinism, late data, precedence, replay, and PIT reconstruction. The goal is simple: prevent silent corruption and ensure SCD2 outputs remain defensible under regulatory scrutiny.

Continue reading

Event-Driven CDC to Correct SCD2 Bronze in 2025–2026

Broken history often stays hidden until remediation or skilled-person reviews. Why? Event-driven Change Data Capture fundamentally changes how history behaves in a data platform. When Financial Services organisations move from batch ingestion to streaming CDC, long-standing SCD2 assumptions quietly break — often without immediate symptoms. Late, duplicated, partial, or out-of-order events can silently corrupt Bronze history and undermine regulatory confidence. This article sets out what “correct” SCD2 means in a streaming world, why most implementations fail, and how to design Bronze pipelines that remain temporally accurate, replayable, and defensible under PRA/FCA scrutiny in 2025–2026.

Continue reading

Golden-Source Resolution, Multi-Source Precedence, and Regulatory Point-in-Time Reporting on SCD2 Bronze

Why Deterministic Precedence Is the Line Between “Data Platform” and “Regulatory Liability”. Modern UK Financial Services organisations ingest customer, account, and product data from 5–20 different systems of record, each holding overlapping and often conflicting truth. Delivering a reliable “Customer 360” or “Account 360” requires deterministic, audit-defensible precedence rules, survivorship logic, temporal correction workflows, and regulatory point-in-time (PIT) reconstructions: all operating on an SCD2 Bronze layer. This article explains how mature banks resolve multi-source conflicts, maintain lineage, rebalance history when higher-precedence data arrives late, and produce FCA/PRA-ready temporal truth. It describes the real patterns used in Tier-1 institutions, and the architectural techniques required to make them deterministic, scalable, and regulator-defensible.

Continue reading

Entity Resolution & Matching at Scale on the Bronze Layer

Entity resolution has become one of the hardest unsolved problems in modern UK Financial Services data platforms. This article sets out a Bronze-layer–anchored approach to resolving customers, accounts, and parties at scale using SCD2 as the temporal backbone. It explains how deterministic, fuzzy, and probabilistic matching techniques combine with blocking, clustering, and survivorship to produce persistent, auditable entity identities. By treating entity resolution as platform infrastructure rather than an application feature, firms can build defensible Customer 360 views, support point-in-time reconstruction, and meet growing FCA and PRA expectations.

Continue reading

Handling Embedded XML/JSON Blobs to Audit-Grade SCD2 Bronze

Financial Services platforms routinely ingest XML and JSON embedded in opaque fields, creating tension between audit fidelity and analytical usability. This article presents a regulator-defensible approach to handling such payloads in the Bronze layer: landing raw data immutably, extracting only high-value attributes, applying attribute-level SCD2, and managing schema drift without data loss. Using hybrid flattening, temporal compaction, and disciplined lineage, banks can transform messy blobs into audit-grade Bronze assets while preserving point-in-time reconstruction and regulatory confidence.

Continue reading

Using SCD2 in the Bronze Layer with a Non-SCD2 Silver Layer: A Modern Data Architecture Pattern for UK Financial Services

UK Financial Services firms increasingly implement SCD2 history in the Bronze layer while providing simplified, non-SCD2 current-state views in the Silver layer. This pattern preserves full historical auditability for FCA/PRA compliance and regulatory forensics, while delivering cleaner, faster, easier-to-use datasets for analytics, BI, and data science. It separates “truth” from “insight,” improves governance, supports Data Mesh models, reduces duplicated logic, and enables deterministic rebuilds across the lakehouse. In regulated UK Financial Services today, it is the only pattern I have seen that satisfies the full, real-world constraint set with no material trade-offs.

Continue reading