Tag Archives: Entity Resolution

Series Wrap-Up: Reconstructing Time, Truth, and Trust in UK Financial Services Data Platforms

This series explored how UK Financial Services data platforms can preserve temporal truth, reconstruct institutional belief, and withstand regulatory scrutiny at scale. Beginning with foundational concepts such as SCD2 and event modelling, it developed into a comprehensive architectural pattern centred on an audit-grade Bronze layer, non-SCD Silver consumption, and point-in-time defensibility. Along the way, it addressed operational reality, governance, cost, AI integration, and regulatory expectations. This final article brings the work together, offering a structured map of the series and a coherent lens for understanding how modern, regulated data platforms actually succeed. Taken together, this body of work describes what I refer to as a “land it early, manage it early” data platform architecture for regulated industries.

Continue reading

The 2026 UK Financial Services Lakehouse Reference Architecture

An opinionated but practical blueprint for regulated, temporal, multi-domain data platforms: focused on authority, belief, and point-in-time defensibility. This article lays out a reference architecture for UK FS in 2026: not as a rigid blueprint, but as a description of what “good” now looks like in banks, insurers, payments firms, wealth platforms, and capital markets organisations operating under FCA/PRA supervision.

Continue reading

Networks, Relationships & Financial Crime Graphs on the Bronze Layer

Financial crime rarely appears in isolated records; it emerges through networks of entities, relationships, and behaviours over time. This article explains why financial crime graphs must be treated as foundational, temporal structures anchored near the Bronze layer of a regulated data platform. It explores how relationships are inferred, versioned, and governed, why “known then” versus “known now” matters, and how poorly designed graphs undermine regulatory defensibility. Done correctly, crime graphs provide explainable, rebuildable network intelligence that stands up to scrutiny years later.

Continue reading

Probabilistic & Graph-Based Identity in Regulated Financial Services

This article argues that probabilistic and graph-based identity techniques are unavoidable in regulated Financial Services, but only defensible when tightly governed. Deterministic entity resolution remains the foundation, providing anchors, constraints, and auditability. Probabilistic scores and identity graphs introduce likelihood and network reasoning, not truth, and must be time-bound, versioned, and replayable. When anchored to immutable history, SCD2 discipline, and clear guardrails, these techniques enhance fraud and AML insight; without discipline, they create significant regulatory risk.

Continue reading

WTF is the Fellegi–Sunter Model? A Practical Guide to Record Matching in an Uncertain World

The Fellegi–Sunter model is the foundational probabilistic framework for record linkage… deciding whether two imperfect records refer to the same real-world entity. Rather than enforcing brittle matching rules, it treats linkage as a problem of weighing evidence under uncertainty. By modelling how fields behave for true matches versus non-matches, it produces interpretable scores and explicit decision thresholds. Despite decades of new tooling and machine learning, most modern matching systems still rest on this logic… often without acknowledging it.

Continue reading

Migrating Legacy EDW Slowly-Changing Dimensions to Lakehouse Bronze

From 20-year-old warehouse SCDs to a modern temporal backbone you can trust. This article lays out a practical, regulator-aware playbook for migrating legacy EDW SCD dimensions to a modern SCD2 Bronze layer in a medallion/lakehouse architecture. It covers what you are really migrating (semantics, not just tables), how to treat the EDW as a source system, how to build canonical SCD2 Bronze, how to run both platforms in parallel, and how to prove to auditors and regulators that nothing has been lost or corrupted in the process.

Continue reading

Enterprise Point-in-Time (PIT) Reconstruction: The Regulatory Playbook

This article sets out the definitive regulatory playbook for enterprise Point-in-Time (PIT) reconstruction in UK Financial Services. It explains why PIT is now a supervisory expectation: driven by PRA/FCA reviews, Consumer Duty, s166 investigations, AML/KYC forensics, and model risk, and makes a clear distinction between “state as known” and “state as now known”. Covering SCD2 foundations, entity resolution, precedence versioning, multi-domain alignment, temporal repair, and reproducible rebuild patterns, it shows how to construct a deterministic, explainable PIT engine that can withstand audit, replay history reliably, and defend regulatory outcomes with confidence.

Continue reading

Entity Resolution & Matching at Scale on the Bronze Layer

Entity resolution has become one of the hardest unsolved problems in modern UK Financial Services data platforms. This article sets out a Bronze-layer–anchored approach to resolving customers, accounts, and parties at scale using SCD2 as the temporal backbone. It explains how deterministic, fuzzy, and probabilistic matching techniques combine with blocking, clustering, and survivorship to produce persistent, auditable entity identities. By treating entity resolution as platform infrastructure rather than an application feature, firms can build defensible Customer 360 views, support point-in-time reconstruction, and meet growing FCA and PRA expectations.

Continue reading