Tag Archives: Data Architecture

Eventual Consistency in Regulated Financial Services Data Platforms

In regulated financial services, eventual consistency is often treated as a technical weakness to be minimised or hidden. This article argues the opposite: eventual consistency is the only honest and defensible consistency model in a multi-system, regulator-supervised institution. Regulators do not require instantaneous agreement: they require explainability, reconstructability, and reasonableness at the time decisions were made. By treating eventual consistency as an explicit architectural and regulatory contract, firms can bound inconsistency, preserve historical belief, and strengthen audit defensibility rather than undermine it.

Continue reading

Networks, Relationships & Financial Crime Graphs on the Bronze Layer

Financial crime rarely appears in isolated records; it emerges through networks of entities, relationships, and behaviours over time. This article explains why financial crime graphs must be treated as foundational, temporal structures anchored near the Bronze layer of a regulated data platform. It explores how relationships are inferred, versioned, and governed, why “known then” versus “known now” matters, and how poorly designed graphs undermine regulatory defensibility. Done correctly, crime graphs provide explainable, rebuildable network intelligence that stands up to scrutiny years later.

Continue reading

Aligning the Data Platform to Enterprise Data & AI Strategy

This article establishes the data platform as the execution engine of Enterprise Data & AI Strategy in Financial Services. It bridges executive strategy and technical delivery by showing how layered architecture (Bronze, Silver, Gold, Platinum), embedded governance, dual promotion lifecycles (North/South and East/West), and domain-aligned operating models turn strategic pillars, architecture & quality, governance, security & privacy, process & tools, and people & culture, into repeatable, regulator-ready outcomes. The result is a platform that delivers control, velocity, semantic alignment, and safe AI enablement at scale.

Continue reading

Measuring Value in a Modern FS Data Platform: Framework for Understanding, Quantifying, and Communicating Data Value in FS

Measuring Value in a Modern FS Data Platform reframes how Financial Services organisations should evaluate data platforms. Rather than measuring pipelines, volumes, or dashboards, true value emerges from consumption, velocity, optionality, semantic alignment, and control. By landing raw data, accelerating delivery through reuse, organising around business domains, and unifying meaning in a layered Bronze–Silver–Gold–Platinum architecture, modern platforms enable faster decisions, richer analytics, regulatory confidence, and long-term adaptability. This article provides a practical, consumption-driven framework for CDOs and CIOs to quantify and communicate real data value.

Continue reading

East/West vs North/South Promotion Lifecycles: How Modern Financial Services Data Platforms Support Operational Stability and Analytical Freedom Simultaneously

This article argues that modern Financial Services (FS) data platforms must deliberately support two distinct but complementary promotion lifecycles. The well known and understood North/South lifecycle provides operational stability, governance, and regulatory safety for customer-facing and auditor-visible systems. In parallel, the East/West lifecycle enables analytical exploration, experimentation, and rapid innovation for data science and analytics teams. By mapping these lifecycles onto layered data architectures (Bronze to Platinum) and introducing clear promotion gates, FS organisations can protect operational integrity while sustaining analytical freedom and innovation.

Continue reading

Gold & Platinum Layer Architecture After Silver

Modern Financial Services data platforms require more than Bronze, Silver, and Gold layers to manage complexity, meaning, and governance. While Silver provides current-state truth and Gold delivers consumption-driven business meaning, neither resolves enterprise-wide semantics. This article introduces the Platinum layer as the conceptual truth layer, reconciling how different domains, systems, and analytical communities understand the same data. Together, Gold and Platinum bridge operational use, analytical insight, and long-lived domain semantics, enabling clarity, velocity, and governed understanding at scale.

Continue reading

Entity Resolution & Matching at Scale on the Bronze Layer

Entity resolution has become one of the hardest unsolved problems in modern UK Financial Services data platforms. This article sets out a Bronze-layer–anchored approach to resolving customers, accounts, and parties at scale using SCD2 as the temporal backbone. It explains how deterministic, fuzzy, and probabilistic matching techniques combine with blocking, clustering, and survivorship to produce persistent, auditable entity identities. By treating entity resolution as platform infrastructure rather than an application feature, firms can build defensible Customer 360 views, support point-in-time reconstruction, and meet growing FCA and PRA expectations.

Continue reading

Handling Embedded XML/JSON Blobs to Audit-Grade SCD2 Bronze

Financial Services platforms routinely ingest XML and JSON embedded in opaque fields, creating tension between audit fidelity and analytical usability. This article presents a regulator-defensible approach to handling such payloads in the Bronze layer: landing raw data immutably, extracting only high-value attributes, applying attribute-level SCD2, and managing schema drift without data loss. Using hybrid flattening, temporal compaction, and disciplined lineage, banks can transform messy blobs into audit-grade Bronze assets while preserving point-in-time reconstruction and regulatory confidence.

Continue reading

Advanced SCD2 Optimisation Techniques for Mature Data Platforms

Advanced SCD2 optimisation techniques are essential for mature Financial Services data platforms, where historical accuracy, regulatory traceability, and scale demands exceed the limits of basic SCD2 patterns. Attribute-level SCD2 significantly reduces storage and computation by tracking changes per column rather than per row. Hybrid SCD2 pipelines, combining lightweight delta logs with periodic MERGEs into the main Bronze table, minimise write amplification and improve reliability. Hash-based and probabilistic change detection eliminate unnecessary updates and accelerate temporal comparison at scale. Together, these techniques enable high-performance, audit-grade SCD2 in platforms such as Databricks, Snowflake, BigQuery, Iceberg, and Hudi, supporting the long-term data lineage and reconstruction needs of regulated UK Financial Services institutions.

Continue reading

Using SCD2 in the Bronze Layer with a Non-SCD2 Silver Layer: A Modern Data Architecture Pattern for UK Financial Services

UK Financial Services firms increasingly implement SCD2 history in the Bronze layer while providing simplified, non-SCD2 current-state views in the Silver layer. This pattern preserves full historical auditability for FCA/PRA compliance and regulatory forensics, while delivering cleaner, faster, easier-to-use datasets for analytics, BI, and data science. It separates “truth” from “insight,” improves governance, supports Data Mesh models, reduces duplicated logic, and enables deterministic rebuilds across the lakehouse. In regulated UK Financial Services today, it is the only pattern I have seen that satisfies the full, real-world constraint set with no material trade-offs.

Continue reading

WTF Is SCD? A Practical Guide to Slowly Changing Dimensions

Slowly Changing Dimensions (SCDs) are how data systems manage attributes that evolve without constantly rewriting history. They determine whether you keep only the latest value, preserve full historical versions, or maintain a limited snapshot of changes. The classic SCD types (0–3, plus hybrids) define different behaviours… from never updating values, to overwriting them, to keeping every version with timestamps. The real purpose of SCDs is to make an explicit choice about how truth should behave in your analytics: what should remain fixed, what should update, and what historical context matters. Modern data platforms make tracking changes easy, but they don’t make the design decisions for you. SCDs are ultimately the backbone of reliable, temporal, reality-preserving analytics.

Continue reading

MapReduce: A 20-Year Retrospective on How Jeffrey Dean and Sanjay Ghemawat Revolutionised Data Processing

This article provides a retrospective on the 20th anniversary of Jeffrey Dean and Sanjay Ghemawat’s seminal paper, “MapReduce: Simplified Data Processing on Large Clusters”. It explores the paper’s lasting impact on data processing, its influence on the development of big data technologies like Hadoop, and its broader implications for industries ranging from digital advertising to healthcare. The article also looks ahead to future trends in data processing, including stream processing and AI, emphasising how MapReduce’s principles will continue to shape the future of distributed computing.

Continue reading