Tag Archives: late arriving data

Production-Grade Testing for SCD2 & Temporal Pipelines

The testing discipline that prevents regulatory failure, data corruption, and sleepless nights in Financial Services. Slowly Changing Dimension Type 2 pipelines underpin regulatory reporting, remediation, risk models, and point-in-time evidence across Financial Services — yet most are effectively untested. As data platforms adopt CDC, hybrid SCD2 patterns, and large-scale reprocessing, silent temporal defects become both more likely and harder to detect. This article sets out a production-grade testing discipline for SCD2 and temporal pipelines, focused on determinism, late data, precedence, replay, and PIT reconstruction. The goal is simple: prevent silent corruption and ensure SCD2 outputs remain defensible under regulatory scrutiny.

Continue reading

Event-Driven CDC to Correct SCD2 Bronze in 2025–2026

Broken history often stays hidden until remediation or skilled-person reviews. Why? Event-driven Change Data Capture fundamentally changes how history behaves in a data platform. When Financial Services organisations move from batch ingestion to streaming CDC, long-standing SCD2 assumptions quietly break — often without immediate symptoms. Late, duplicated, partial, or out-of-order events can silently corrupt Bronze history and undermine regulatory confidence. This article sets out what “correct” SCD2 means in a streaming world, why most implementations fail, and how to design Bronze pipelines that remain temporally accurate, replayable, and defensible under PRA/FCA scrutiny in 2025–2026.

Continue reading