Temporal RAG: Retrieving “State as Known on Date X” for LLMs in Financial Services

This article explains why standard Retrieval-Augmented Generation (RAG) silently corrupts history in Financial Services by answering past questions with present-day truth. It introduces Temporal RAG: a regulator-defensible retrieval pattern that conditions every query on an explicit as_of timestamp and retrieves only from Point-in-Time (PIT) slices governed by SCD2 validity, precedence rules, and repair policies. Using concrete implementation patterns and audit reconstruction examples, it shows how to make LLM retrieval reproducible, evidential, and safe for complaints, remediation, AML, and conduct-risk use cases.

Executive Summary (TL;DR)

Why normal RAG quietly corrupts history — and how to make LLM retrieval audit-grade in regulated environments. Most Retrieval-Augmented Generation (RAG) implementations are implicitly presentist: they retrieve “the best matching chunks” from whatever the knowledge base looks like today. In Financial Services, that behaviour is not just technically wrong — it is often regulator-hostile.

If an LLM answers a complaint, remediation, AML, or conduct-risk question using today’s corrected truth rather than what the institution actually knew at the time, you have created an un-auditable narrative generator.

Temporal RAG is the missing pattern: retrieval must be conditioned on an explicit as_of timestamp, and must query a Point-in-Time (PIT) slice that reconstructs the state as known on date X, using the same precedence rules, SCD2 validity windows, and repair policies that govern your platform.

Put simply:

Standard RAG answers “what we know now.”
Temporal RAG answers “what we knew then.”

And in FS, that difference is everything.

Part of the “land it early, manage it early” series on SCD2-driven Bronze architectures for regulated Financial Services. Temporal RAG for “as known” state in FS LLMs, for AI engineers, data scientists, and governance teams who need regulator-safe retrieval. This article gives the pattern to prevent hindsight bias in AI outputs.

Table of Contents

Executive Summary (TL;DR)
Contents
1. Introduction: The Legal Difference Between “As Known” and “As Now Known”
- 1.1 “State as known on date X”
- 1.2 “State as now known about date X”
  - FS Reality Check
2. Why Normal RAG Is “Illegal” (or at least unacceptable) for Historical Questions in FS
- 2.1 The Silent Failure Mode
3. The Temporal RAG Pattern
- 3.1 Temporal RAG Flow
4. Implementing Temporal RAG on Databricks (Delta Lake)
5. Implementing Temporal RAG on Snowflake
6. Metadata You Must Embed in Every Vector (Non-Negotiable)
- FS Reality Check
7. Audit Reconstruction Example: “Show Me Exactly What the LLM Saw”
8. Human-in-the-Loop: When Temporal RAG Must Not Be Autonomous
9. Conclusion: Temporal RAG Is the Only RAG Pattern That Survives FS Reality

1. Introduction: The Legal Difference Between “As Known” and “As Now Known”

Before discussing RAG, vectors, or LLM behaviour, it’s necessary to be precise about what “truth” means in a regulated Financial Services context. In most industries, historical reconstruction is an analytical convenience. In FS, it is a legal boundary. The difference between what an institution knew at a point in time and what it can reconstruct later is not academic — it determines whether a narrative is defensible or misleading. Financial Services is one of the few industries where time isn’t just an analytical convenience — it’s a legal and operational boundary.

There are two distinct truths:

1.1 “State as known on date X”

When regulators, auditors, or courts ask historical questions, they are not asking what the platform can infer today. They are asking what the organisation actually knew when decisions were made. This subsection grounds that concept in concrete FS use cases, where the answer must reflect contemporaneous knowledge, not corrected hindsight.

This is the state your institution actually held at the time decisions were made.

What risk score did we hold on the day we gave advice?
What KYC attributes were present when the transaction was screened?
What affordability inputs were used when credit was granted?

This truth is what matters for:

Consumer Duty look-backs
complaints and remediation
conduct risk investigations
AML / sanctions / PEP decisions
operational resilience replay (“what did our systems know during incident Y?”)

1.2 “State as now known about date X”

Modern data platforms are very good at reconstructing improved versions of the past. That capability is valuable — but it is also dangerous when used without care. This subsection explains the alternative truth most platforms default to, and why it must not be mistaken for “as known” truth in regulated scenarios.

This is the state you can reconstruct today, after backfills, corrections, late-arriving data, and restatements.

It is often useful for:

model improvement
backtesting
reconciliations
“what should have happened” analysis

But it is not the same as “as known”. In regulated contexts, conflating them creates a compliance and evidential risk.

FS Reality Check

This distinction is not theoretical. It is exactly where many otherwise well-engineered platforms fail under regulatory scrutiny. This reality check shows how easily the two concepts are conflated in practice, and why regulators care deeply about the difference.

This is the precise gap that causes platforms to fail in s166 reviews:

The bank presents a reconstruction that includes data they did not actually have at the time.
The regulator (or Skilled Person) asks: “Can you prove this is what you knew then, not what you know now?”

If you cannot answer that with evidence, your platform is not defensible.

2. Why Normal RAG Is “Illegal” (or at least unacceptable) for Historical Questions in FS

RAG is often presented as a harmless retrieval layer that simply “grounds” an LLM in data. In regulated environments, that assumption is wrong. Retrieval is not neutral — it encodes assumptions about time, truth, and authority. If those assumptions are implicit or uncontrolled, RAG becomes a mechanism for historical distortion rather than explanation.

I’ll be blunt because it matters: most RAG implementations are not fit for regulated historical queries. They are ‘Illegal’ in effect, even if not explicitly prohibited in statute.

Why?

Because standard RAG works like this:

Embed documents/chunks from your knowledge base (today’s corpus).
Run similarity search.
Retrieve “top K” matches.
Feed to the LLM.
Output an answer.

This is fine for general corporate knowledge. It is unsafe in FS for historical questions because:

Your knowledge base changes over time.
Corrections and backfills overwrite meaning.
Your vector store becomes a “latest truth index”.

The resulting LLM behaviour is predictable:

It answers past questions with present-day truth.
It invents coherence across time.
It produces narratives that are hard to reproduce later.

2.1 The Silent Failure Mode

The most dangerous failures are not noisy. They produce confident, plausible answers that appear internally consistent and are therefore easy to trust. This subsection walks through how normal RAG quietly substitutes present-day truth for historical knowledge — and why that failure often goes unnoticed until it matters most.

A complaint handler asks:

“What risk profile did we hold when we sold this product in March 2022?”

Standard RAG retrieves today’s “Customer Risk Profile” chunk — which reflects:

later KYC updates
late-arriving corrections
precedence changes
restated AML classifications

The LLM produces a clean answer — and you’ve now:

embedded a false historical claim into a regulated process, and
created an output you can’t defend if challenged.

Even if your data platform is perfectly built, standard RAG can still corrupt the regulatory narrative, because it sits above the platform and ignores temporal constraints.

3. The Temporal RAG Pattern

Once the failure mode is clear, the solution is conceptually simple but architecturally non-negotiable. Retrieval must be bound to time in the same way the underlying data platform is. Temporal RAG is not a new model or tool — it is a discipline that aligns LLM retrieval with point-in-time reconstruction, precedence rules, and temporal governance.

Temporal RAG is just standard RAG with one additional requirement:

Every retrieval must be conditioned on an explicit as_of timestamp, and must retrieve only from the PIT slice valid at that time.

3.1 Temporal RAG Flow

Rather than describing tools or products, it is more useful to describe the invariant flow that makes Temporal RAG work. This subsection sets out the minimal sequence that ensures retrieval reflects “state as known”, not “state as now known”.

Query → as_of_timestamp → PIT view → vectorise that slice only → retrieve → feed LLM

In practice, you implement this as one of three patterns:

Pattern A — On-demand PIT slice → retrieval

Some use cases demand precision over speed. This pattern introduces the most conservative and defensible approach, where retrieval scope is deliberately narrow and constructed per question.

Build PIT slice for the entity/time window
Retrieve within that slice only
Best for investigations / complaints where scope is narrow but precision is critical

Pattern B — Pre-materialised PIT snapshots (daily/weekly)

Other use cases trade some flexibility for throughput. This pattern introduces pre-built temporal snapshots that can be reused safely at scale, provided their construction is governed.

Materialise daily PIT snapshots in Silver (or an investigative zone)
Embed snapshots
Best for high-volume workflows: conduct risk look-backs, repeated complaint themes

Pattern C — Temporal index with validity metadata filtering

This pattern trades higher storage and metadata overhead for lower compute cost at query time, and is most suitable where repeated re-embedding is undesirable but strict temporal filtering can be enforced reliably. In some architectures, storage cost is less of a concern than repeated computation. This pattern introduces a way to retain a single index while still enforcing temporal correctness at retrieval time.

Store vectors once, but attach:
- effective_from / effective_to
- reconstruction_rule_version
Retrieval filters by “as_of”
Best when storage cost is more acceptable than constantly re-embedding

Temporal RAG Flow Pattern Conclusion

All three work. Which you use depends on performance constraints and regulatory exposure.

Lifecycle Note:

Where PIT snapshots or temporally-filtered indices are embedded, vectors must be invalidated and re-embedded when underlying data is corrected, restated, or subject to temporal repair.
Failing to do so creates a stale historical index that no longer reflects “state as known”.

4. Implementing Temporal RAG on Databricks (Delta Lake)

The theory of Temporal RAG only holds if temporal logic remains inside the data platform, not pushed into prompts or application code. This section shows how that principle translates into a concrete Databricks implementation, using Delta Lake’s strengths to preserve auditability and replay.

The key design goal is: PIT logic stays in the platform, not inside the LLM layer.

4.1 Build a PIT slice from Bronze SCD2

Everything starts with a correct point-in-time slice. This subsection anchors the discussion in canonical SCD2 semantics, showing how “as known” state is derived before any AI interaction occurs.

Assume Bronze contains SCD2 rows with validity windows:

effective_from
effective_to
is_current (optional convenience, not authoritative)

A canonical PIT filter:

-- Customer attributes as known on :as_of_ts
SELECT *
FROM bronze.customer_scd2
WHERE customer_id = :customer_id
  AND :as_of_ts >= effective_from
  AND :as_of_ts <  effective_to;

If you have multi-source precedence, apply it in the PIT view (simplified example):

WITH candidates AS (
  SELECT *,
         ROW_NUMBER() OVER (
           PARTITION BY customer_id, attribute_name
           ORDER BY precedence_rank ASC, source_timestamp DESC
         ) AS rn
  FROM bronze.customer_attribute_values
  WHERE :as_of_ts >= effective_from
    AND :as_of_ts <  effective_to
)
SELECT *
FROM candidates
WHERE rn = 1;

You now have a PIT-correct “as known” state.

4.2 Materialise a PIT slice for embedding (recommended)

Embedding directly from raw or dynamic views introduces unnecessary risk. This subsection explains why materialisation is a control mechanism, not an optimisation, and why it matters for audit and reproducibility.

Rather than embedding Bronze directly, write to an “AI-ready” table:

CREATE OR REPLACE TABLE silver.ai_pit_customer_snapshot AS
SELECT
  customer_id,
  :as_of_ts AS as_of_ts,
  to_json(named_struct(
     'name', name,
     'address', address,
     'risk_score', risk_score,
     'kyc_flags', kyc_flags
  )) AS pit_payload_json,
  source_system,
  precedence_rank,
  effective_from,
  effective_to,
  reconstruction_rule_version
FROM <pit_view>;

From here, you chunk pit_payload_json, embed it, and index it.

4.3 What matters in Databricks

Rather than cataloguing features, this subsection highlights the few aspects of Databricks that materially affect whether a Temporal RAG implementation is defensible or fragile.

PIT slices should be produced by governed jobs, not ad hoc notebooks.
Delta Lake gives you auditability and replay.
Store the embedding job run-id and link it to the PIT slice.

5. Implementing Temporal RAG on Snowflake

Snowflake’s modelling and query semantics make point-in-time reconstruction particularly readable. This section mirrors the Databricks discussion, showing that the pattern is portable across platforms as long as the same temporal discipline is enforced.

Snowflake makes PIT reconstruction clean, especially with structured modelling.

5.1 PIT slice using validity windows

This subsection establishes the baseline PIT logic in Snowflake terms, keeping the focus on semantics rather than syntax.

SELECT *
FROM bronze.customer_scd2
WHERE customer_id = :customer_id
  AND :as_of_ts >= effective_from
  AND :as_of_ts <  effective_to;

5.2 Precedence selection using QUALIFY (highly readable)

Where multiple sources compete, precedence must be explicit and deterministic. This subsection shows how Snowflake’s syntax makes that logic transparent — and therefore easier to defend.

SELECT *
FROM candidate_attribute_values
WHERE :as_of_ts >= effective_from
  AND :as_of_ts <  effective_to
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY customer_id, attribute_name
  ORDER BY precedence_rank ASC, source_timestamp DESC
) = 1;

5.3 Materialise to an “AI-ready” PIT snapshot table

As with Databricks, materialisation is the point where governance is enforced. This subsection explains how to structure that boundary cleanly in Snowflake.

CREATE OR REPLACE TABLE silver.ai_pit_customer_snapshot AS
SELECT
  customer_id,
  :as_of_ts AS as_of_ts,
  OBJECT_CONSTRUCT(
     'name', name,
     'address', address,
     'risk_score', risk_score,
     'kyc_flags', kyc_flags
  ) AS pit_payload,
  source_system,
  precedence_rank,
  effective_from,
  effective_to,
  reconstruction_rule_version
FROM <pit_view>;

Embed pit_payload and store the metadata.

Snowflake’s strength here is how clean it is to implement precedence and PIT selection at query time — but you still need to control embedding scope.

6. Metadata You Must Embed in Every Vector (Non-Negotiable)

Temporal correctness alone is not enough. If retrieved chunks cannot be traced back to governed source records, the system still fails under scrutiny. This section defines the minimum metadata required to turn vector retrieval into regulatory evidence rather than unverifiable context.

If you want Temporal RAG to be audit-grade, every vector/chunk must carry:

source_system
precedence_rank
effective_from
effective_to
as_of_ts (or snapshot date)
reconstruction_rule_version
bronze_row_pointer (PK / business key + version identifiers)
sensitivity_classification (PII, PCI, etc.)

This metadata is not optional — it is what turns AI retrieval from “knowledge search” into “regulatory evidence”.

Emerging Risk:
Where synthetic or augmented data is introduced into RAG corpora, it must be clearly labelled and excluded from Temporal RAG used for regulated historical narratives.

FS Reality Check

At audit time, vague assurances about “grounding” are worthless. What matters is whether you can point to exact rows, rules, and versions. This reality check makes explicit what happens when that metadata is missing.

If you cannot answer:

“Which exact SCD2 rows did the model rely on?”

…you don’t have Temporal RAG. You have a storytelling engine.

7. Audit Reconstruction Example: “Show Me Exactly What the LLM Saw”

This is the question that ultimately matters. Everything in the architecture either enables or prevents a clean answer to it. This section walks through what a successful reconstruction looks like in practice, without relying on hypothetical tooling.

A regulator (or internal audit) asks:

“This complaint response was issued on 2024-06-15.
Show me exactly what the LLM saw when it generated the narrative.”

A Temporal RAG platform must be able to produce:

The prompt text and model version
The as_of timestamp used
The retrieved chunks (exact text)
The vector metadata for each chunk
The source table/row pointers back to Bronze
The precedence / reconstruction rule versions used
The full output presented to the user

This is not theoretical. This is what “defensible AI” means in regulated FS.

8. Human-in-the-Loop: When Temporal RAG Must Not Be Autonomous

Even perfectly temporal retrieval does not eliminate the need for judgement. Some classes of questions are inherently sensitive, and LLM outputs in those contexts must be mediated. This section explains where Temporal RAG should remain decision support rather than automated truth.

There are query classes where Temporal RAG outputs should be treated as decision support, not automatic truth.

Examples:

complaints and redress
PEP / sanctions narratives
large exposure decisions
high-risk customer downgrades
product suitability explanations

A mature implementation adds triggers:

query classification
risk scoring
mandatory reviewer sign-off
suppression of unverifiable outputs

Temporal correctness is necessary — it is not sufficient.

9. Conclusion: Temporal RAG Is the Only RAG Pattern That Survives FS Reality

By this point, the conclusion should feel inevitable. This final section ties Temporal RAG back to the broader platform doctrine: time, precedence, lineage, and accountability are not optional layers — they are what make AI usable at all in regulated Financial Services.

If you’re building LLM systems for Financial Services, you have to internalise something uncomfortable:

A normal RAG system will eventually produce a historically incorrect answer.
And in FS, “eventually” means “at the exact worst moment.”

Temporal RAG fixes that by aligning retrieval with your platform’s strongest architectural assets:

SCD2 truth in Bronze
PIT reconstruction rules
multi-source precedence
late-arriving data repair
lineage and governance

It ensures the LLM can answer:

what the institution knew then
not
what the platform knows now

And it makes outputs reproducible, auditable, and defensible.

In 2025–2026, that is the line between:

“LLMs as a toy”
and
“LLMs as a regulated capability.”

Horkan

a blog by Wayne Horkan

Temporal RAG: Retrieving “State as Known on Date X” for LLMs in Financial Services

Executive Summary (TL;DR)

Contents

1. Introduction: The Legal Difference Between “As Known” and “As Now Known”

1.1 “State as known on date X”

1.2 “State as now known about date X”

FS Reality Check

2. Why Normal RAG Is “Illegal” (or at least unacceptable) for Historical Questions in FS

2.1 The Silent Failure Mode

3. The Temporal RAG Pattern

3.1 Temporal RAG Flow

Pattern A — On-demand PIT slice → retrieval

Pattern B — Pre-materialised PIT snapshots (daily/weekly)

Pattern C — Temporal index with validity metadata filtering

Temporal RAG Flow Pattern Conclusion

4. Implementing Temporal RAG on Databricks (Delta Lake)

4.1 Build a PIT slice from Bronze SCD2

4.2 Materialise a PIT slice for embedding (recommended)

4.3 What matters in Databricks

5. Implementing Temporal RAG on Snowflake

5.1 PIT slice using validity windows

5.2 Precedence selection using QUALIFY (highly readable)

5.3 Materialise to an “AI-ready” PIT snapshot table

6. Metadata You Must Embed in Every Vector (Non-Negotiable)

FS Reality Check

7. Audit Reconstruction Example: “Show Me Exactly What the LLM Saw”

8. Human-in-the-Loop: When Temporal RAG Must Not Be Autonomous

9. Conclusion: Temporal RAG Is the Only RAG Pattern That Survives FS Reality