Tag Archives: Machine Learning Data

Databricks vs Snowflake vs Fabric vs Other Tech with SCD2 Bronze: Choosing the Right Operating Model

Choosing the right platform for implementing SCD2 in the Bronze layer is not a tooling decision but an operating model decision. At scale, SCD2 Bronze forces trade-offs around change capture, merge frequency, physical layout, cost governance, and long-term analytics readiness. Different platforms optimise for different assumptions about who owns those trade-offs. This article compares Databricks, Snowflake, Microsoft Fabric, and alternative technologies through that lens, with practical guidance for Financial Services organisations designing SCD2 Bronze layers that must remain scalable, auditable, and cost-effective over time.

Continue reading

From Partitioning to Liquid Clustering: Evolving SCD2 Bronze on Databricks at Scale

As SCD2 Bronze layers mature, even well-designed partitioning and ZORDER strategies can struggle under extreme scale, high-cardinality business keys, and evolving access patterns. This article examines why SCD2 Bronze datasets place unique pressure on static data layouts and introduces Databricks Liquid Clustering as a natural next step in their operational evolution. It explains when Liquid Clustering becomes appropriate, how it fits within regulated Financial Services environments, and how it preserves auditability while improving long-term performance and readiness for analytics and AI workloads.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Snowflake: Best Practices and Architectural Guidance

Slowly Changing Dimension Type 2 (SCD2) patterns are widely used in Snowflake-based Financial Services platforms to preserve full historical change for regulatory, analytical, and audit purposes. However, Snowflake’s architecture differs fundamentally from file-oriented lakehouse systems, requiring distinct design and operational choices. This article provides practical, production-focused guidance for operating large-scale SCD2 Bronze layers on Snowflake. It explains how to use Streams, Tasks, micro-partition behaviour, batching strategies, and cost-aware configuration to ensure predictable performance, controlled spend, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading

Managing a Rapidly Growing SCD2 Bronze Layer on Databricks: Best Practices and Practical Guidance ready for AI Workloads

Slowly Changing Dimension Type 2 (SCD2) patterns are increasingly used in the Bronze layer of Databricks-based platforms to meet regulatory, analytical, and historical data requirements in Financial Services. However, SCD2 Bronze tables grow rapidly and can become costly, slow, and operationally fragile if not engineered carefully. This article provides practical, production-tested guidance for managing large-scale SCD2 Bronze layers on Databricks using Delta Lake. It focuses on performance, cost control, metadata health, and long-term readiness for analytics and AI workloads in regulated environments.

Continue reading