The Database That Remembers Why

Every database you've ever used makes the same promise: store this, give it back when I ask. Some are faster than others. Some scale further. Some have better query languages, more flexible schemas, more elegant APIs.

But they all share an assumption so deeply embedded that most engineers never think to question it: the current state is the truth.

When you update a row, the old value disappears. When you delete a record, it's gone. The database tells you what things look like right now, and it does that job well. What it doesn't tell you is how things got here, whether the current state is still valid, or what happens downstream when something changes.

For most applications, that's fine. For the kind of system we're building, it's not.

A document walks into a database

Let's follow a document through a traditional system. Say it's a security policy. Version 1.0. It gets ingested, chunked, embedded, and stored. A user asks a question about security policies, the system retrieves it, and everything works.

A month later, someone publishes version 2.0. It gets ingested alongside version 1.0. Now there are two versions in the database, and the system has no intrinsic way to know that one supersedes the other. It knows they both exist. It knows their embeddings. It might notice they're textually similar. But the relationship, that v2.0 replaces v1.0, isn't something the database understands. It's metadata at best, a comment at worst, and usually nothing at all.

Now someone asks "what's our current security policy?" The system retrieves both versions and picks whichever one the language model finds most compelling. If you're lucky, that's v2.0. If you're not, it's v1.0 because v1.0 was more concisely written and the model preferred its clarity.

Nobody notices. The answer looks fine. It is fine, right up until someone makes a decision based on a policy that was retired six weeks ago.

This is not a retrieval problem. It's a state problem. The database stored both documents faithfully. It just doesn't understand what happened between them.

What if the database remembered?

CoreCrux is an append-only event spine. That sentence is doing a lot of work, so let me unpack it.

Append-only means nothing is ever overwritten. When version 2.0 of that security policy arrives, it doesn't replace version 1.0 in the store. Instead, a new event is written: "v2.0 supersedes v1.0." Both documents still exist. Both are still retrievable. But the relationship between them is now a first-class entity in the system, as real and as queryable as the documents themselves.

Event spine means the source of truth is the sequence of events, not the current state. The state is derived. You can always reconstruct it by replaying the events from the beginning. The same events, replayed in the same order, produce the same state. Every time. Deterministically.

This distinction sounds academic until you need to answer a question like: "What did the system know about this document at 2pm on Tuesday?" In a traditional database, you'd need to reconstruct that from logs, backups, or audit tables bolted on after the fact. In CoreCrux, the answer is already there. You replay the event stream up to Tuesday at 2pm, and the state at that moment materialises in front of you. No guessing. No approximation.

Enter Projections

If the event stream is the source of truth, Projections are the lenses you look through to make sense of it.

A Projection is a derived state view. It reads events from the stream, applies a pure function to each one, and maintains a running summary. Think of it as a continuously updated answer to a specific question about the state of the world.

CoreCrux has four Projections, and each one exists because of a question that traditional databases can't answer well.

Artifact Living State. Is this document still alive?

Not "does it exist in the database." It exists. But is it active, currently the best representation of the thing it describes? Is it stale, still technically current but showing its age? Is it contested, another document is claiming to be a better answer? Is it superseded, formally replaced by something newer? Is it deprecated, marked for removal?

The Living State Projection tracks this lifecycle for every artifact in the system. When version 2.0 supersedes version 1.0, the Projection marks v1.0 as superseded and v2.0 as active. When a contradictory document is ingested, both are marked as contested until the conflict is resolved. The system doesn't just store documents. It understands where each one sits in its lifecycle.

To put this in familiar terms: if a traditional database is a filing cabinet, the Living State Projection is the person who knows which files in the cabinet are current, which are outdated, and which are actively being argued about. Most organisations have that person. Most databases don't.

Artifact Relations. How are these documents connected?

Documents don't exist in isolation. A policy has implementations. A decision has consequences. An incident report leads to a post-mortem, which leads to a remediation plan, which leads to an updated runbook. In a traditional database, these connections live in a developer's head, or maybe in a join table that someone built once and nobody maintains.

The Relations Projection tracks eight types of connections: supports, contradicts, supersedes, duplicates, elaborates, derived_from, cites, and about_same_entity. Each relation carries a confidence score and a reference to the evidence that established it.

When the retrieval engine finds a document, it can follow the Relations Projection to find amendments filed against it, implementations that reference it, and contradictions that challenge it. Text similarity alone would miss most of these. A regulation written in legal language and a technical implementation written in systems language have almost no textual overlap, but they're deeply connected. The Projection knows that.

Pressure Events. What's pushing this document to change?

This is the one that surprises people.

Most systems treat documents as static until someone explicitly updates them. CoreCrux tracks pressure, signals that a document's current state may not be sustainable. A pressure event might be a new regulation that affects an existing policy, a dependency that's been deprecated, or a related document that's been superseded.

Each pressure event has a lifecycle: observed, acknowledged, action_taken, resolved. The Projection tracks where each pressure signal sits in that lifecycle, and computes an aggregate pressure level for each artifact.

Think of it like a structural engineer monitoring load on a bridge. The bridge is fine right now. But if three new pressure signals arrive and none of them are acknowledged, the system knows that this artifact is under increasing strain. It hasn't broken yet. But someone should probably look at it.

Artifact Dependents. What breaks if this changes?

The flip side of knowing what a document depends on is knowing what depends on it. The Dependents Projection maintains a reverse index: for every artifact, which answers, collections, and downstream artifacts reference it?

This is the question that keeps compliance officers awake. If you update a foundational policy, what answers were produced using the old version? Which downstream systems ingested those answers? How far does the blast radius extend?

In a traditional database, answering this requires tracing through application logs, API call records, and probably a few spreadsheets. The Dependents Projection answers it with a single lookup.

Why GPU? Why append-only? Why any of this?

The honest answer is: because the alternative doesn't work at the scale we need.

A traditional database can track document status. You add a status column, write some application logic to update it, and call it done. But that approach has three problems that get worse as the system grows.

Correctness under concurrency. When multiple processes are ingesting documents, updating relations, and computing pressure signals simultaneously, keeping a mutable status column consistent requires locks, transactions, and careful coordination. Get it wrong and you have a document marked "active" that was superseded two seconds ago by a different process.

CoreCrux sidesteps this entirely. Events are appended to a log. Projections are recomputed from the log. There is no mutable state to corrupt. Two processes can write events concurrently without coordination, and the Projections will converge to the correct state regardless of the order the events are processed.

Auditability. When a regulator asks "why is this document marked as superseded, and when did that happen?" a mutable database can only answer if someone thought to add audit logging. And audit logs are application-level, not storage-level. They can be incomplete, inconsistent, or simply wrong.

In CoreCrux, the audit trail is the database. Every state change is an event. Every event is immutable. The answer to "when did this change and why" is always available, always complete, and always consistent with the current state, because the current state is computed from those same events.

Deterministic replay. This is the one that matters most for our use case.

When a CROWN receipt is produced, it's anchored to a specific cursor position in the event stream. Months later, someone can replay the events up to that cursor and reconstruct the exact Projection state that existed at receipt time: which artifacts were active, which were superseded, which relations were in play, which pressure signals were unresolved. Same bytes in, same state out. The evidence layer is fully reproducible.

An important caveat: deterministic replay applies to the Projection layer and the evidence it tracks, not to the full answer path. The LLM that synthesises an answer from evidence is not deterministic across model versions. If the model changes between receipt creation and verification, the same evidence could produce a different synthesis. Right now, we can't control that. The model is a dependency we don't own, and model providers change weights, fine-tuning, and output distributions without notice, sometimes without changelog. This is an active area of work for us. Constraining and detecting LLM output drift is one of the harder open problems in the space, and we're not going to pretend we've solved it.

What we can do is detect when it happens.

Every CROWN receipt records a hash of the prompt template and the model identifier used at synthesis time. WatchCrux, our continuous monitoring service, runs model-drift sentinel packs: a curated set of queries across categories that are most sensitive to LLM behavioural shifts. When a model version changes, or when the same model starts producing different citation patterns on the same evidence, the sentinel pack catches it. It doesn't prevent drift. It makes drift visible, immediately and quantifiably, so that the system can flag affected receipts rather than silently serving answers whose synthesis no longer matches the evidence profile they were built on.

This matters more than it might sound. Most AI systems treat model updates as invisible infrastructure changes. The API returns the same shape of response, so nothing looks different. But for a system that signs receipts about what evidence was used and how it was synthesised, a model that cites differently is a material change. The Projection layer guarantees that the evidence state is reproducible. The drift detection layer guarantees that if the synthesis layer has shifted, someone knows about it. The gap between those two guarantees, making the synthesis itself reproducible, is where the hardest remaining work lives.

Try reconstructing evidence state from a mutable database. You'd need a complete set of audit logs, a way to replay them in order, and confidence that nothing was missed or modified. It's theoretically possible. In practice, nobody does it.

As for the GPU: Projections are embarrassingly parallel. Each event updates one artifact's state independently of every other artifact. A GPU can process thousands of these updates simultaneously. On an RTX 4000 SFF Ada with 20GB of VRAM, the hot working set (every artifact's current status, every relation edge, every active pressure signal) fits comfortably in GPU memory. Cold data (the full event history, variable-length evidence references) lives on NVMe storage, accessed through GPUDirect Storage when the system needs to look back.

To give a sense of the throughput: a single GPU core processes one Projection update in about the same time it takes a traditional database to acquire a row lock. The GPU has 7,680 cores.

What this isn't

CoreCrux is not trying to replace Postgres. It's not a general-purpose database. You wouldn't use it to store user accounts or shopping carts or session tokens.

It's a specialised system for a specific problem: tracking the lifecycle and relationships of knowledge artifacts over time, with cryptographic verifiability and deterministic replay. It sits alongside traditional databases, not instead of them. The retrieval engine uses Postgres for document storage and Qdrant for vector search. CoreCrux handles what neither of those can: the living, evolving, pressure-sensitive state of what those documents mean in relation to each other.

If Postgres is a warehouse, efficient, reliable, holding everything in its place, CoreCrux is the institutional memory of the organisation that built the warehouse. It knows which shelves are stale, which items are contested, and which wall is going to need reinforcement if you add one more thing to it.

The quiet part

I've been deliberate about keeping the tone measured here, because the thing that makes CoreCrux interesting isn't speed or scale. It's the shape of the problem it solves.

Every organisation that uses AI-assisted decision-making will eventually face a question they cannot answer with their current infrastructure: "When the system told us X three months ago, was that still true? What evidence was it based on? Has that evidence changed since? What other decisions depended on it?"

Today, answering that question requires forensic reconstruction. Tomorrow, when the regulatory environment catches up to the deployment reality, it will require proof.

CoreCrux Projections are how you build proof into the substrate, not bolt it on afterwards. They track what's alive, what's connected, what's under pressure, and what depends on what, continuously, deterministically, and from the ground up.

It's not the flashiest piece of infrastructure we've built. But it might be the one that matters most.