Tag Archives: Medallion Architecture

Integrating AI and LLMs into Regulated Financial Services Data Platforms

How AI fits into Bronze/Silver/Gold without breaking lineage, PIT, or SMCR: This article sets out a regulator-defensible approach to integrating AI and LLMs into UK Financial Services data platforms (structurally accurate for 2025/2026). It argues that AI must operate as a governed consumer and orchestrator of a temporal medallion architecture, not a parallel system. By defining four permitted integration patterns, PIT-aware RAG, controlled Bronze embeddings, anonymised fine-tuning, and agentic orchestration, it shows how to preserve lineage, point-in-time truth, and SMCR accountability while enabling practical AI use under PRA/FCA scrutiny.

Continue reading

Measuring Value in a Modern FS Data Platform: Framework for Understanding, Quantifying, and Communicating Data Value in FS

Measuring Value in a Modern FS Data Platform reframes how Financial Services organisations should evaluate data platforms. Rather than measuring pipelines, volumes, or dashboards, true value emerges from consumption, velocity, optionality, semantic alignment, and control. By landing raw data, accelerating delivery through reuse, organising around business domains, and unifying meaning in a layered Bronze–Silver–Gold–Platinum architecture, modern platforms enable faster decisions, richer analytics, regulatory confidence, and long-term adaptability. This article provides a practical, consumption-driven framework for CDOs and CIOs to quantify and communicate real data value.

Continue reading

East/West vs North/South Promotion Lifecycles: How Modern Financial Services Data Platforms Support Operational Stability and Analytical Freedom Simultaneously

This article argues that modern Financial Services (FS) data platforms must deliberately support two distinct but complementary promotion lifecycles. The well known and understood North/South lifecycle provides operational stability, governance, and regulatory safety for customer-facing and auditor-visible systems. In parallel, the East/West lifecycle enables analytical exploration, experimentation, and rapid innovation for data science and analytics teams. By mapping these lifecycles onto layered data architectures (Bronze to Platinum) and introducing clear promotion gates, FS organisations can protect operational integrity while sustaining analytical freedom and innovation.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Snowflake: Best Practices and Architectural Guidance

Slowly Changing Dimension Type 2 (SCD2) patterns are widely used in Snowflake-based Financial Services platforms to preserve full historical change for regulatory, analytical, and audit purposes. However, Snowflake’s architecture differs fundamentally from file-oriented lakehouse systems, requiring distinct design and operational choices. This article provides practical, production-focused guidance for operating large-scale SCD2 Bronze layers on Snowflake. It explains how to use Streams, Tasks, micro-partition behaviour, batching strategies, and cost-aware configuration to ensure predictable performance, controlled spend, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Databricks: Best Practices and Practical Guidance ready for AI Workloads

Slowly Changing Dimension Type 2 (SCD2) patterns are increasingly used in the Bronze layer of Databricks-based platforms to meet regulatory, analytical, and historical data requirements in Financial Services. However, SCD2 Bronze tables grow rapidly and can become costly, slow, and operationally fragile if not engineered carefully. This article provides practical, production-tested guidance for managing large-scale SCD2 Bronze layers on Databricks using Delta Lake. It focuses on performance, cost control, metadata health, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

From SCD2 Bronze to a Non-SCD Silver Layer in Other Tech (Iceberg, Hudi, BigQuery, Fabric)

Modern data platforms consistently separate historical truth from analytical usability by storing full SCD2 history in a Bronze layer and exposing a simplified, current-state Silver layer. Whether using Apache Iceberg, Apache Hudi, Google BigQuery, or Microsoft Fabric, the same pattern applies: Bronze preserves immutable, auditable change history, while Silver removes temporal complexity to deliver one row per business entity. Each platform implements this differently, via snapshots, incremental queries, QUALIFY, or Delta MERGE, but the architectural principle remains universal and essential for regulated environments.

Continue reading

From SCD2 Bronze to a Non-SCD Silver Layer in Snowflake

This article explains a best-practice Snowflake pattern for transforming an SCD2 Bronze layer into a non-SCD Silver layer that exposes clean, current-state data. By retaining full historical truth in Bronze and using Streams, Tasks, and incremental MERGE logic, organisations can efficiently materialise one-row-per-entity Silver tables optimised for analytics. The approach simplifies governance, reduces cost, and delivers predictable performance for BI, ML, and regulatory reporting, while preserving complete auditability required in highly regulated financial services environments.

Continue reading

From SCD2 Bronze to a Non-SCD Silver Layer in Databricks

This article explains a best-practice Databricks lakehouse pattern for transforming fully historical SCD2 Bronze data into clean, non-SCD Silver tables. Bronze preserves complete temporal truth for audit, compliance, and investigation, while Silver exposes simplified, current-state views optimised for analytics and data products. Using Delta Lake features such as MERGE, Change Data Feed, OPTIMIZE, and ZORDER, organisations, particularly in regulated Financial Services, can efficiently maintain audit-proof history while delivering fast, intuitive, consumption-ready datasets.

Continue reading

Operationalising SCD2 at Scale: Monitoring, Cost Controls, and Governance for a Healthy Bronze Layer

This article explains how to operationalise Slowly Changing Dimension Type 2 (SCD2) at scale in the Bronze layer of a medallion architecture, with a focus on highly regulated Financial Services environments. It outlines three critical pillars: monitoring, cost control, and governance, needed to keep historical data trustworthy, performant, and compliant. By tracking growth patterns, preventing meaningless updates, controlling storage and compute costs, and enforcing clear governance, organisations can ensure their Bronze layer remains a reliable audit-grade historical asset rather than an unmanaged data swamp.

Continue reading

Advanced SCD2 Optimisation Techniques for Mature Data Platforms

Advanced SCD2 optimisation techniques are essential for mature Financial Services data platforms, where historical accuracy, regulatory traceability, and scale demands exceed the limits of basic SCD2 patterns. Attribute-level SCD2 significantly reduces storage and computation by tracking changes per column rather than per row. Hybrid SCD2 pipelines, combining lightweight delta logs with periodic MERGEs into the main Bronze table, minimise write amplification and improve reliability. Hash-based and probabilistic change detection eliminate unnecessary updates and accelerate temporal comparison at scale. Together, these techniques enable high-performance, audit-grade SCD2 in platforms such as Databricks, Snowflake, BigQuery, Iceberg, and Hudi, supporting the long-term data lineage and reconstruction needs of regulated UK Financial Services institutions.

Continue reading