Submitting CROWN to SCITT

Today I sent an email to the IETF SCITT working group introducing CROWN as an application profile for AI retrieval-evidence provenance.

That sentence sounds dry. Let me tell you why it isn't.

The problem we're trying to solve

When an AI system retrieves evidence to support a decision, whether it's answering a compliance question, summarising a financial regulation, or telling you what your company's policy says about data retention, there is usually no verifiable trail of what happened. You get an answer. Maybe it cites some documents. But you can't prove which documents were in the retrieval set, whether they were current at query time, how confident the system was, or whether anyone tampered with the record after the fact.

For a demo, that's fine. For a regulated industry where an auditor asks "show me what evidence supported this answer six months ago", it's not fine. The gap between "agents are doing real work" and "we can prove what agents did" is still wide. CROWN is designed to close it.

A CROWN receipt proves three things: what evidence was retrieved (hashed citation set with quote hashes), that the evidence was current (knowledge-state cursor anchored to the corpus timeline), and that the receipt hasn't been tampered with (BLAKE3 hash chain and Ed25519 signatures). Receipts come in three assurance levels: light, verified, and audit.

That's the protocol. The question was always: does anyone else care about this problem, and is there an existing standards framework where it fits?

Why SCITT

SCITT stands for Supply Chain Integrity, Transparency, and Trust. It's an IETF working group that's been building a standard for signed statements about supply chain artifacts. The architecture is content-agnostic by design: you define what your signed statement means (that's the application profile), and SCITT provides the envelope format (COSE_Sign1), the registration flow (SCRAPI), and the transparency log that gives you third-party verifiability.

There are already application profiles being developed for AI-adjacent use cases. Kamimura's CAP-SRP handles refusal provenance, proving that an AI system refused a request and why. VCP handles algorithmic trading audit trails. CROWN addresses a different layer: when the system did answer, was the answer grounded in specific, current evidence?

These are complementary. A complete AI audit trail in a regulated industry might need all three. The SCITT architecture lets them share the same verification infrastructure while each defining their own claim semantics.

So SCITT is the right home. The question was whether CROWN could actually align with its primitives cleanly or whether we'd be forcing a square peg into a round hole.

What we built to align

The alignment work was substantial. CROWN existed as a JSON-based protocol with its own signing and verification before any of this. Getting it into SCITT shape meant implementing a real COSE_Sign1 encoding path, not as a side project or a spec document, but in the production system.

Here's what that involved.

COSE_Sign1 wrapping. Every CROWN receipt is now wrapped in a COSE_Sign1 envelope per RFC 9052. The canonical JSON receipt is serialised to CBOR with kebab-case keys (matching our CDDL schema), then signed with Ed25519 via HashiCorp Vault Transit. The protected header carries the algorithm identifier, content type (application/vnd.crown.receipt+cbor), key ID, and CWT Claims with issuer and subject URNs. The API supports Accept: application/cose for raw envelope retrieval.

CDDL schema. We published a formal CBOR type definition at crown-receipt.cddl, modelled on existing SCITT application profile patterns. Every field in a CROWN receipt has a CDDL type. The camelCase-to-kebab-case transformation happens at serialisation time and is deterministic.

Receipt schema versioning. We shipped schema 1.1, which adds llmModel and llmRequestId to the canonical receipt payload. These fields are hash-bound via BLAKE3, meaning you can prove which language model produced the answer and trace it to a specific API request. The versioning uses a clean undefined vs null distinction: undefined means schema 1.0 (field omitted from hash), null means schema 1.1 where the LLM was not called (field included in hash as null). This matters for light-mode receipts where no LLM is involved.

Standalone verification. The crown-verify CLI can verify receipt hash chains, chain linkage, and COSE_Sign1 envelopes without any CueCrux infrastructure. It's a zero-dependency verification path. You download the binary, point it at a .cbor file and a public key, and it tells you whether the signature is valid, what the issuer and subject are, and whether the embedded receipt hash matches.

Interop pack. We built a complete end-to-end walkthrough showing one canonical receipt through the full SCITT path: receipt creation, CBOR encoding, COSE_Sign1 wrapping, SCRAPI registration, and verification. The SCRAPI registration step is illustrative because no operational Transparency Service currently accepts third-party application profiles. That's an ecosystem gap, not ours, and we say so explicitly in the documentation.

How we tested it

This is the part I'm most confident about, because the numbers are real and reproducible.

The audit suite (published at AuditCrux, MIT license) covers twelve test categories across 1074 unique documents and 462 queries. It measures everything from supersession accuracy and causal chain retrieval to format-aware ingestion recall, fragility calibration, receipt chain integrity, and hard-negative overlap resistance.

Phase 7.4, the current baseline, passed 12/12 categories across five independent runs on the production server. The runs are 037b303a, 80434381, 69341abe, e0bfbd9b, and fabf5dc8. Every run ID is published. Every claim can be reproduced.

The numbers that matter: Cat 2 (format-aware citation recall) ranges 0.633 to 0.693 across the five runs, which is LLM-contingent variance. Cat 7 (broad recall), Cat 8 (proposition precision at 1), Cat 11 (chunking stress broad recall), and Cat 12 (hard-negative overlap parent-child recall) are perfectly deterministic across all five runs. Cat 5 (receipt chain stress) confirms schema 1.1 receipts chain correctly at depth.

Phase 7.4 added LLM metadata binding. Zero retrieval code was changed. The five-run validation confirms the quality baseline from Phase 7.3 was maintained with no regression.

The suite separates pipeline retrieval quality from LLM citation selection, which is important. When retrieved recall is 1.0 but citation recall is 0.33, the pipeline found everything, the LLM just chose to cite different documents. That distinction matters for understanding where failures actually are.

What's published

Everything is at ResearchCrux under CC BY 4.0. The audit suite is at AuditCrux under MIT.

ResearchCrux contains the protocol spec, SCITT compatibility layer (CDDL, COSE examples, interop pack, registration policy, privacy considerations), the benchmark ledger with per-run evidence, a proof gallery with full and redacted receipt examples, regulatory mappings for EU AI Act and DORA, and the verification library.

AuditCrux contains the full benchmark suite with every category definition, corpus, ground truth, and scoring function. Anyone can run it against their own Engine instance.

What we're honest about

The SCITT integration document labels itself Version 0.2, Pre-submission Review. That's deliberate.

The main gap is live Transparency Service interop. No CROWN receipt has been registered with a live SCITT Transparency Service and verified via a returned inclusion proof. The full encoding path is walked end-to-end in the interop pack, but the SCRAPI registration and TS Receipt steps are illustrative. As of today, no operational TS accepts third-party application profiles. We state this explicitly in the documentation because it's better to pin your own caveats to the wall than let someone else discover them dramatically.

IANA content-type registration is deferred until the profile reaches stability that warrants it. Multi-TS registration guidance is deferred because it's a deployment concern, not a protocol concern.

What happens next

I don't know. That's the honest answer. The email asks the SCITT working group for three things: feedback on the CROWN-to-SCITT mapping, interest in AI evidence provenance as a use case, and guidance on next steps toward a formal application profile document.

The protocol works. The implementation works. The evidence is published and reproducible. Whether the standards community sees AI retrieval-evidence provenance as a meaningful addition to the SCITT framework is a question only they can answer.

I think it is. When agents start making decisions that affect regulated industries, someone is going to ask "prove what evidence you used." CROWN is one answer to that question, built to fit inside an architecture that already handles signed statements, transparency logs, and third-party verification.

If the working group is interested, we'll move toward a formal Internet-Draft. If they have concerns about the mapping, we'll address them. If they think the use case doesn't belong in SCITT, that's useful feedback too.

Either way, the protocol and the evidence are public. Anyone can verify the claims. That's rather the point.