Tag Archives: Data Lineage

Integrating AI and LLMs into Regulated Financial Services Data Platforms

How AI fits into Bronze/Silver/Gold without breaking lineage, PIT, or SMCR: This article sets out a regulator-defensible approach to integrating AI and LLMs into UK Financial Services data platforms (structurally accurate for 2025/2026). It argues that AI must operate as a governed consumer and orchestrator of a temporal medallion architecture, not a parallel system. By defining four permitted integration patterns, PIT-aware RAG, controlled Bronze embeddings, anonymised fine-tuning, and agentic orchestration, it shows how to preserve lineage, point-in-time truth, and SMCR accountability while enabling practical AI use under PRA/FCA scrutiny.

Continue reading

Measuring Value in a Modern FS Data Platform: Framework for Understanding, Quantifying, and Communicating Data Value in FS

Measuring Value in a Modern FS Data Platform reframes how Financial Services organisations should evaluate data platforms. Rather than measuring pipelines, volumes, or dashboards, true value emerges from consumption, velocity, optionality, semantic alignment, and control. By landing raw data, accelerating delivery through reuse, organising around business domains, and unifying meaning in a layered Bronze–Silver–Gold–Platinum architecture, modern platforms enable faster decisions, richer analytics, regulatory confidence, and long-term adaptability. This article provides a practical, consumption-driven framework for CDOs and CIOs to quantify and communicate real data value.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Snowflake: Best Practices and Architectural Guidance

Slowly Changing Dimension Type 2 (SCD2) patterns are widely used in Snowflake-based Financial Services platforms to preserve full historical change for regulatory, analytical, and audit purposes. However, Snowflake’s architecture differs fundamentally from file-oriented lakehouse systems, requiring distinct design and operational choices. This article provides practical, production-focused guidance for operating large-scale SCD2 Bronze layers on Snowflake. It explains how to use Streams, Tasks, micro-partition behaviour, batching strategies, and cost-aware configuration to ensure predictable performance, controlled spend, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Databricks: Best Practices and Practical Guidance ready for AI Workloads

Slowly Changing Dimension Type 2 (SCD2) patterns are increasingly used in the Bronze layer of Databricks-based platforms to meet regulatory, analytical, and historical data requirements in Financial Services. However, SCD2 Bronze tables grow rapidly and can become costly, slow, and operationally fragile if not engineered carefully. This article provides practical, production-tested guidance for managing large-scale SCD2 Bronze layers on Databricks using Delta Lake. It focuses on performance, cost control, metadata health, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

Production-Grade Testing for SCD2 & Temporal Pipelines

The testing discipline that prevents regulatory failure, data corruption, and sleepless nights in Financial Services. Slowly Changing Dimension Type 2 pipelines underpin regulatory reporting, remediation, risk models, and point-in-time evidence across Financial Services — yet most are effectively untested. As data platforms adopt CDC, hybrid SCD2 patterns, and large-scale reprocessing, silent temporal defects become both more likely and harder to detect. This article sets out a production-grade testing discipline for SCD2 and temporal pipelines, focused on determinism, late data, precedence, replay, and PIT reconstruction. The goal is simple: prevent silent corruption and ensure SCD2 outputs remain defensible under regulatory scrutiny.

Continue reading

Event-Driven CDC to Correct SCD2 Bronze in 2025–2026

Broken history often stays hidden until remediation or skilled-person reviews. Why? Event-driven Change Data Capture fundamentally changes how history behaves in a data platform. When Financial Services organisations move from batch ingestion to streaming CDC, long-standing SCD2 assumptions quietly break — often without immediate symptoms. Late, duplicated, partial, or out-of-order events can silently corrupt Bronze history and undermine regulatory confidence. This article sets out what “correct” SCD2 means in a streaming world, why most implementations fail, and how to design Bronze pipelines that remain temporally accurate, replayable, and defensible under PRA/FCA scrutiny in 2025–2026.

Continue reading

Golden-Source Resolution, Multi-Source Precedence, and Regulatory Point-in-Time Reporting on SCD2 Bronze

Why Deterministic Precedence Is the Line Between “Data Platform” and “Regulatory Liability”. Modern UK Financial Services organisations ingest customer, account, and product data from 5–20 different systems of record, each holding overlapping and often conflicting truth. Delivering a reliable “Customer 360” or “Account 360” requires deterministic, audit-defensible precedence rules, survivorship logic, temporal correction workflows, and regulatory point-in-time (PIT) reconstructions: all operating on an SCD2 Bronze layer. This article explains how mature banks resolve multi-source conflicts, maintain lineage, rebalance history when higher-precedence data arrives late, and produce FCA/PRA-ready temporal truth. It describes the real patterns used in Tier-1 institutions, and the architectural techniques required to make them deterministic, scalable, and regulator-defensible.

Continue reading

Entity Resolution & Matching at Scale on the Bronze Layer

Entity resolution has become one of the hardest unsolved problems in modern UK Financial Services data platforms. This article sets out a Bronze-layer–anchored approach to resolving customers, accounts, and parties at scale using SCD2 as the temporal backbone. It explains how deterministic, fuzzy, and probabilistic matching techniques combine with blocking, clustering, and survivorship to produce persistent, auditable entity identities. By treating entity resolution as platform infrastructure rather than an application feature, firms can build defensible Customer 360 views, support point-in-time reconstruction, and meet growing FCA and PRA expectations.

Continue reading

Handling Embedded XML/JSON Blobs to Audit-Grade SCD2 Bronze

Financial Services platforms routinely ingest XML and JSON embedded in opaque fields, creating tension between audit fidelity and analytical usability. This article presents a regulator-defensible approach to handling such payloads in the Bronze layer: landing raw data immutably, extracting only high-value attributes, applying attribute-level SCD2, and managing schema drift without data loss. Using hybrid flattening, temporal compaction, and disciplined lineage, banks can transform messy blobs into audit-grade Bronze assets while preserving point-in-time reconstruction and regulatory confidence.

Continue reading

Using SCD2 in the Bronze Layer with a Non-SCD2 Silver Layer: A Modern Data Architecture Pattern for UK Financial Services

UK Financial Services firms increasingly implement SCD2 history in the Bronze layer while providing simplified, non-SCD2 current-state views in the Silver layer. This pattern preserves full historical auditability for FCA/PRA compliance and regulatory forensics, while delivering cleaner, faster, easier-to-use datasets for analytics, BI, and data science. It separates “truth” from “insight,” improves governance, supports Data Mesh models, reduces duplicated logic, and enables deterministic rebuilds across the lakehouse. In regulated UK Financial Services today, it is the only pattern I have seen that satisfies the full, real-world constraint set with no material trade-offs.

Continue reading