Blobs as First-Class Artefacts in Regulated Data Platforms

In regulated financial services, semi-structured payloads such as XML, JSON, PDFs, and messages are not “raw data” to be discarded after parsing: they are primary evidence. This article argues that blobs must be treated as first-class artefacts: preserved intact, timestamped, queryable, and reinterpretable over time. Relational models are interpretations that evolve; original payloads anchor truth. Platforms that discard or mutate artefacts optimise for neatness today at the cost of defensibility tomorrow.

Table of Contents

Contents
1. Introduction: Why “Raw Data” Is the Wrong Mental Model
2. What a Blob Actually Represents
3. Why Regulators Care About Payloads, Not Tables
4. Blobs as Parallel Truth, Not Input Material
5. Partial Interpretation Is Not Failure
6. Why Blobs Must Remain Queryable
7. Blobs and Temporal Truth
8. Where Platforms Commonly Go Wrong
9. Blobs Do Not Replace Models… They Protect Them
10. What This Article Does and Does Not Claim
11. The Architectural Consequence
12. Conclusion and Closing

1. Introduction: Why “Raw Data” Is the Wrong Mental Model

Modern data platforms inherit a set of assumptions from analytics and engineering cultures that prioritise cleanliness, optimisation, and abstraction. In regulated environments, those instincts collide with a very different reality — one in which accountability is exercised long after the fact, under external scrutiny, and against the record as it originally existed.

In many data platforms, XML, JSON, PDFs, FIX messages, ISO20022 payloads, and other semi-structured artefacts are treated as raw material: something to be parsed, flattened, normalised, and eventually discarded.

In regulated financial services, this framing is dangerously wrong.

These artefacts are not “raw”.
They are primary evidence.

A mature regulated data platform treats blobs not as staging debris, but as first-class, durable artefacts with independent legal, regulatory, and analytical value.

This distinction matters because regulatory accountability is retrospective, adversarial, and evidence-driven: not model-driven.

Part of the “land it early, manage it early” series on SCD2-driven Bronze architectures for regulated Financial Services. Blobs as evidentiary anchors in regulated FS, for ingestion engineers, architects, and audit teams who need to treat payloads as parallel truth. This article gives practices to make artefacts queryable and defensible.

2. What a Blob Actually Represents

To understand why these artefacts deserve protection, it helps to step back from storage formats and pipelines and look instead at what is being captured. The significance of these objects lies less in how awkward they are to process and more in what they embody when they arrive.

A blob is not just an inconvenient data shape.

a blob is an assertion:

made by a system, counterparty, or human
expressed in the schema and semantics of its origin
at a specific moment in time

Examples:

ISO20022 payment messages
regulatory submissions
KYC documents
trade confirmations
audit extracts
case notes and free-text narratives

These artefacts are frequently:

the only defensible record of what was communicated
legally binding
reinterpretable as standards evolve

Flattening them prematurely destroys meaning.

3. Why Regulators Care About Payloads, Not Tables

Regulatory engagement does not start from the internal architecture of a firm’s data estate. It starts from externally observable actions, submissions, communications, and decisions — and works backward from there to establish responsibility, intent, and knowledge.

Under regulatory scrutiny, firms are rarely asked:

“Which table did this come from?”

They are asked:

“What exactly was submitted?”
“What did the counterparty send?”
“What did the firm receive?”
“What evidence did you have at the time?”

At some point, every regulated firm is asked to reconstruct a decision with imperfect hindsight: “Based on what you knew on this date, why was this payment released?”

That question cannot be answered from a reconciled table built months later. It can only be answered from the artefacts that existed at the time — messages received, documents reviewed, and assertions made — exactly as they were.

The answer is almost always a payload, not a relational projection.

A platform that cannot produce original artefacts — intact, timestamped, and contextualised — is already on weak footing, regardless of how clean its derived tables look.

4. Blobs as Parallel Truth, Not Input Material

Much of the confusion around blobs comes from treating them as an early phase in a linear pipeline. That framing breaks down once you accept that different representations can coexist without one invalidating the other.

The critical shift is this:

Blobs are not inputs to truth.
They are parallel representations of it.

Relational models are interpretations of blobs.

They:

select attributes
impose structure
apply business meaning
evolve over time

The blob remains the anchor.

A mature platform therefore preserves:

the original payload
its metadata (source, encoding, headers, schema version)
its arrival and event timestamps
its lineage to all downstream interpretations

This allows reinterpretation without rewriting history.

5. Partial Interpretation Is Not Failure

There is a persistent belief that usefulness and completeness are the same thing. In practice, regulated work is often driven by narrow, evolving questions rather than total semantic mastery of every artefact.

A common anti-pattern is the belief that a blob must be “fully parsed” to be useful.

In reality:

many payloads are only partially relevant
regulatory questions often focus on a subset of fields
new questions emerge years later

Partial interpretation allows:

early value without semantic lock-in
deferred modelling decisions
safe evolution of understanding

The cost of premature full normalisation is rigidity — and regulatory fragility.

6. Why Blobs Must Remain Queryable

Preservation alone does not guarantee accountability. Artefacts that cannot be located, connected, or examined in context may technically exist while being operationally useless when scrutiny arrives.

Treating blobs as archival objects is insufficient.

In regulated environments, blobs must be:

discoverable
queryable
linkable to derived data
accessible for investigation and replay

This does not mean everything queries blobs directly.

It means:

the platform can traverse from a report or decision
back to the exact artefact(s) that informed it
without lossy transformation

Queryable blobs enable:

forensic analysis
dispute resolution
retrospective reinterpretation
model explainability

Please note: Queryable does not mean universally queried or analytically convenient; it means that artefacts can be reliably located, filtered, and referenced through content, metadata and lineage when accountability demands it.

7. Blobs and Temporal Truth

Regulatory questions are almost always temporal in nature. They hinge on what was known, believed, or communicated at a specific moment — not on what is known now, after reconciliation and correction.

Blobs are inherently temporal.

They:

represent what was known or asserted at a time
may later be superseded, corrected, or contradicted
must not be overwritten

Preserving blobs aligns naturally with:

temporal history
point-in-time reconstruction
belief evolution

Deleting or mutating blobs severs the evidentiary chain.

8. Where Platforms Commonly Go Wrong

Most failures in this area are not the result of negligence or bad intent. They arise from applying otherwise sensible data-engineering heuristics in environments where the cost of irreversibility is unusually high.

This principle is usually violated when:

Raw layers are treated as temporary
payloads are discarded after parsing
schemas are enforced too early
only “clean” data is retained
storage cost is prioritised over defensibility

These choices optimise for short-term neatness at the expense of long-term accountability.

9. Blobs Do Not Replace Models… They Protect Them

Positioning blobs as first-class artefacts is sometimes misread as hostility to modelling discipline. In reality, the opposite is true: clear separation strengthens both evidence and interpretation.

Treating blobs as first-class artefacts does not mean abandoning relational models.

It means:

models are explicitly recognised as interpretations
interpretations can evolve
original evidence remains intact

This separation protects both:

analysts get clean, stable views
regulators get unaltered evidence

The platform does not have to choose.

10. What This Article Does and Does Not Claim

At this point, it is worth drawing a firm boundary around scope. The argument being made here is conceptual and architectural, not prescriptive about tooling, layouts, or fashionable patterns.

This article does not:

define ingestion pipelines
mandate specific storage formats
prescribe Raw/Base/Bronze layouts
describe AI or RAG consumption

Those are implementation details.

This article exists to make one position explicit:

In regulated financial services, original payloads are evidence.
Evidence must be preserved, queryable, and reinterpretable.

11. The Architectural Consequence

Once artefacts are treated as durable truth rather than disposable inputs, a number of downstream design tensions resolve themselves. Decisions that once felt expensive or excessive begin to look like straightforward risk management.

Once blobs are treated as first-class artefacts:

ingestion maximalism becomes rational
Raw/Base/Bronze distinctions loosen naturally
reinterpretation becomes safe
audit trails become robust
future questions become answerable

Most importantly, the platform stops assuming it knows today what regulators will ask tomorrow.

12. Conclusion and Closing

The long-term integrity of a regulated platform is determined less by how elegant its models are today than by how well it preserves the ability to answer tomorrow’s questions honestly.

Relational models age.
Standards change.
Interpretations evolve.

Evidence does not.

A mature regulated data platform preserves artefacts first — and derives meaning second.

Blobs are not a mess to be cleaned up.
In regulated environments, discarding or mutating them is not an optimisation choice: it is a governance decision with consequences.
Why? Because they too are the evidence on which trust, accountability, and regulatory defence ultimately rest.

Horkan

a blog by Wayne Horkan