Tag Archives: Lakehouse

Handling Embedded XML/JSON Blobs to Audit-Grade SCD2 Bronze

Financial Services platforms routinely ingest XML and JSON embedded in opaque fields, creating tension between audit fidelity and analytical usability. This article presents a regulator-defensible approach to handling such payloads in the Bronze layer: landing raw data immutably, extracting only high-value attributes, applying attribute-level SCD2, and managing schema drift without data loss. Using hybrid flattening, temporal compaction, and disciplined lineage, banks can transform messy blobs into audit-grade Bronze assets while preserving point-in-time reconstruction and regulatory confidence.

Continue reading

Databricks vs Snowflake: A Critical Comparison of Modern Data Platforms

This article provides a critical, side-by-side comparison of Databricks and Snowflake, drawing on real-world experience leading enterprise data platform teams. It covers their origins, architecture, programming language support, workload fit, operational complexity, governance, AI capabilities, and ecosystem maturity. The guide helps architects and data leaders understand the philosophical and technical trade-offs, whether prioritising AI-native flexibility and open-source alignment with Databricks or streamlined governance and SQL-first simplicity with Snowflake. Practical recommendations, strategic considerations, and guidance by team persona equip readers to choose or combine these platforms to align with their data strategy and talent strengths.

Continue reading