In regulated financial services, semi-structured payloads such as XML, JSON, PDFs, and messages are not “raw data” to be discarded after parsing: they are primary evidence. This article argues that blobs must be treated as first-class artefacts: preserved intact, timestamped, queryable, and reinterpretable over time. Relational models are interpretations that evolve; original payloads anchor truth. Platforms that discard or mutate artefacts optimise for neatness today at the cost of defensibility tomorrow.
Contents
- Contents
- 1. Introduction: Why “Raw Data” Is the Wrong Mental Model
- 2. What a Blob Actually Represents
- 3. Why Regulators Care About Payloads, Not Tables
- 4. Blobs as Parallel Truth, Not Input Material
- 5. Partial Interpretation Is Not Failure
- 6. Why Blobs Must Remain Queryable
- 7. Blobs and Temporal Truth
- 8. Where Platforms Commonly Go Wrong
- 9. Blobs Do Not Replace Models… They Protect Them
- 10. What This Article Does and Does Not Claim
- 11. The Architectural Consequence
- 12. Conclusion and Closing
1. Introduction: Why “Raw Data” Is the Wrong Mental Model
Modern data platforms inherit a set of assumptions from analytics and engineering cultures that prioritise cleanliness, optimisation, and abstraction. In regulated environments, those instincts collide with a very different reality — one in which accountability is exercised long after the fact, under external scrutiny, and against the record as it originally existed.
In many data platforms, XML, JSON, PDFs, FIX messages, ISO20022 payloads, and other semi-structured artefacts are treated as raw material: something to be parsed, flattened, normalised, and eventually discarded.
In regulated financial services, this framing is dangerously wrong.
These artefacts are not “raw”.
They are primary evidence.
A mature regulated data platform treats blobs not as staging debris, but as first-class, durable artefacts with independent legal, regulatory, and analytical value.
This distinction matters because regulatory accountability is retrospective, adversarial, and evidence-driven: not model-driven.
Part of the “land it early, manage it early” series on SCD2-driven Bronze architectures for regulated Financial Services. Blobs as evidentiary anchors in regulated FS, for ingestion engineers, architects, and audit teams who need to treat payloads as parallel truth. This article gives practices to make artefacts queryable and defensible.
2. What a Blob Actually Represents
To understand why these artefacts deserve protection, it helps to step back from storage formats and pipelines and look instead at what is being captured. The significance of these objects lies less in how awkward they are to process and more in what they embody when they arrive.
A blob is not just an inconvenient data shape.
a blob is an assertion:
- made by a system, counterparty, or human
- expressed in the schema and semantics of its origin
- at a specific moment in time
Examples:
- ISO20022 payment messages
- regulatory submissions
- KYC documents
- trade confirmations
- audit extracts
- case notes and free-text narratives
These artefacts are frequently:
- the only defensible record of what was communicated
- legally binding
- reinterpretable as standards evolve
Flattening them prematurely destroys meaning.
3. Why Regulators Care About Payloads, Not Tables
Regulatory engagement does not start from the internal architecture of a firm’s data estate. It starts from externally observable actions, submissions, communications, and decisions — and works backward from there to establish responsibility, intent, and knowledge.
Under regulatory scrutiny, firms are rarely asked:
“Which table did this come from?”
They are asked:
- “What exactly was submitted?”
- “What did the counterparty send?”
- “What did the firm receive?”
- “What evidence did you have at the time?”
At some point, every regulated firm is asked to reconstruct a decision with imperfect hindsight: “Based on what you knew on this date, why was this payment released?”
That question cannot be answered from a reconciled table built months later. It can only be answered from the artefacts that existed at the time — messages received, documents reviewed, and assertions made — exactly as they were.
The answer is almost always a payload, not a relational projection.
A platform that cannot produce original artefacts — intact, timestamped, and contextualised — is already on weak footing, regardless of how clean its derived tables look.
4. Blobs as Parallel Truth, Not Input Material
Much of the confusion around blobs comes from treating them as an early phase in a linear pipeline. That framing breaks down once you accept that different representations can coexist without one invalidating the other.
The critical shift is this:
Blobs are not inputs to truth.
They are parallel representations of it.
Relational models are interpretations of blobs.
They:
- select attributes
- impose structure
- apply business meaning
- evolve over time
The blob remains the anchor.
A mature platform therefore preserves:
- the original payload
- its metadata (source, encoding, headers, schema version)
- its arrival and event timestamps
- its lineage to all downstream interpretations
This allows reinterpretation without rewriting history.
5. Partial Interpretation Is Not Failure
There is a persistent belief that usefulness and completeness are the same thing. In practice, regulated work is often driven by narrow, evolving questions rather than total semantic mastery of every artefact.
A common anti-pattern is the belief that a blob must be “fully parsed” to be useful.
In reality:
- many payloads are only partially relevant
- regulatory questions often focus on a subset of fields
- new questions emerge years later
Partial interpretation allows:
- early value without semantic lock-in
- deferred modelling decisions
- safe evolution of understanding
The cost of premature full normalisation is rigidity — and regulatory fragility.
6. Why Blobs Must Remain Queryable
Preservation alone does not guarantee accountability. Artefacts that cannot be located, connected, or examined in context may technically exist while being operationally useless when scrutiny arrives.
Treating blobs as archival objects is insufficient.
In regulated environments, blobs must be:
- discoverable
- queryable
- linkable to derived data
- accessible for investigation and replay
This does not mean everything queries blobs directly.
It means:
- the platform can traverse from a report or decision
- back to the exact artefact(s) that informed it
- without lossy transformation
Queryable blobs enable:
- forensic analysis
- dispute resolution
- retrospective reinterpretation
- model explainability
Please note: Queryable does not mean universally queried or analytically convenient; it means that artefacts can be reliably located, filtered, and referenced through content, metadata and lineage when accountability demands it.
7. Blobs and Temporal Truth
Regulatory questions are almost always temporal in nature. They hinge on what was known, believed, or communicated at a specific moment — not on what is known now, after reconciliation and correction.
Blobs are inherently temporal.
They:
- represent what was known or asserted at a time
- may later be superseded, corrected, or contradicted
- must not be overwritten
Preserving blobs aligns naturally with:
- temporal history
- point-in-time reconstruction
- belief evolution
Deleting or mutating blobs severs the evidentiary chain.
8. Where Platforms Commonly Go Wrong
Most failures in this area are not the result of negligence or bad intent. They arise from applying otherwise sensible data-engineering heuristics in environments where the cost of irreversibility is unusually high.
This principle is usually violated when:
- Raw layers are treated as temporary
- payloads are discarded after parsing
- schemas are enforced too early
- only “clean” data is retained
- storage cost is prioritised over defensibility
These choices optimise for short-term neatness at the expense of long-term accountability.
9. Blobs Do Not Replace Models… They Protect Them
Positioning blobs as first-class artefacts is sometimes misread as hostility to modelling discipline. In reality, the opposite is true: clear separation strengthens both evidence and interpretation.
Treating blobs as first-class artefacts does not mean abandoning relational models.
It means:
- models are explicitly recognised as interpretations
- interpretations can evolve
- original evidence remains intact
This separation protects both:
- analysts get clean, stable views
- regulators get unaltered evidence
The platform does not have to choose.
10. What This Article Does and Does Not Claim
At this point, it is worth drawing a firm boundary around scope. The argument being made here is conceptual and architectural, not prescriptive about tooling, layouts, or fashionable patterns.
This article does not:
- define ingestion pipelines
- mandate specific storage formats
- prescribe Raw/Base/Bronze layouts
- describe AI or RAG consumption
Those are implementation details.
This article exists to make one position explicit:
In regulated financial services, original payloads are evidence.
Evidence must be preserved, queryable, and reinterpretable.
11. The Architectural Consequence
Once artefacts are treated as durable truth rather than disposable inputs, a number of downstream design tensions resolve themselves. Decisions that once felt expensive or excessive begin to look like straightforward risk management.
Once blobs are treated as first-class artefacts:
- ingestion maximalism becomes rational
- Raw/Base/Bronze distinctions loosen naturally
- reinterpretation becomes safe
- audit trails become robust
- future questions become answerable
Most importantly, the platform stops assuming it knows today what regulators will ask tomorrow.
12. Conclusion and Closing
The long-term integrity of a regulated platform is determined less by how elegant its models are today than by how well it preserves the ability to answer tomorrow’s questions honestly.
Relational models age.
Standards change.
Interpretations evolve.
Evidence does not.
A mature regulated data platform preserves artefacts first — and derives meaning second.
Blobs are not a mess to be cleaned up.
In regulated environments, discarding or mutating them is not an optimisation choice: it is a governance decision with consequences.
Why? Because they too are the evidence on which trust, accountability, and regulatory defence ultimately rest.