Skip to content

Now

Last updated: 2026-03-24

Current focus

Retrieval engine hardening and audit validation

The retrieval engine has passed 13/13 audit categories across three consecutive runs with zero variance. Phase 7.3 is the current production baseline.

Recent work has focused on:

  • Format-aware citation improvements for structured data (JSON, YAML, CSV)
  • Relation-pair preservation in the citation controller for parent-child document recall
  • Model-drift sentinel packs to detect when LLM behaviour shifts under the same evidence
  • Config manifest enforcement: 23 feature flags locked and verified before every audit run

The engine is now stable enough that the work has shifted from making it pass to proving it stays passed.

CROWN protocol and SCITT submission

CROWN produces cryptographic receipts that anchor AI-assisted decisions to the evidence that informed them. The M8 milestone deployed COSE_Sign1 signing and was revalidated at 13/13 across three runs.

Current focus:

  • Preparing the IETF SCITT working group submission with test vectors and standalone verification library
  • CDDL schemas (RFC 8610) for CBOR-encoded receipts
  • Regulatory mapping covering EU AI Act Articles 13-14 and DORA Articles 8-11

The goal is to establish AI evidence provenance as a use case for SCITT transparency infrastructure.

CoreCrux: Projections and the decision plane

CoreCrux is a GPU-backed append-only event spine that tracks the lifecycle and relationships of knowledge artifacts. It runs on a dedicated RTX 4000 SFF Ada with 20GB of VRAM.

Recent milestones:

  • Projections enabled in production: artifact living state, relations, pressure events, and dependents
  • Projection snapshots bootstrapped via corecruxctl
  • Decision plane read path under active investigation (shard lock contention)

CoreCrux is the evidence substrate underneath VaultCrux and the retrieval engine. Getting the decision plane read path working cleanly is the current blocker.

MemoryCrux: planning the agent memory layer

MemoryCrux is a unified memory layer that sits between AI agents and the organisational context they need to act safely. The Master Plan is at v2.1.

Phase 4 (Agent Effectiveness and External Surfaces) is in planning, covering:

  • Coverage assessment: telling agents what they don't know relative to a task
  • Scoped context briefing: task-shaped, budget-aware context payloads
  • Contextual escalation: preserving reasoning state when agents hit uncertainty
  • Decision checkpoints: receipted snapshots of active decision state
  • Constraint suggestions: letting agents contribute institutional knowledge for human review
  • External proof surfaces: decision chain proof pages and cross-platform receipt anchors

Phase 4 addresses the agent memory wall: sessions measured in hours, projects spanning months, and no institutional context to bridge the gap.

Writing

I've been writing more frequently about what building these systems actually looks like from the inside.

  • Thirteen Out of Thirteen: the full audit story, from four categories to thirteen
  • Three Versions of the Same Question: how the retrieval engine evolved across major versions
  • What Happens When You Hit Submit: the end-to-end query lifecycle
  • The Database That Remembers Why: CoreCrux Projections explained
  • The Model You Don't Own: model drift as a first-class release variable
  • It's All About the Context: the agent memory problem and how MemoryCrux addresses it

The writing functions as both a public record of the work and a way to pressure-test ideas before they become architecture.

Active work

The technical focus right now:

  • Engine Phase 7.3 frozen as production baseline (config manifest 6.7)
  • CoreCrux decision plane read path (shard lock contention fix)
  • MemoryCrux Phase 4 ExecPlan (9 milestones, 7 new MCP tools, 2 external surfaces)
  • CROWN SCITT submission preparation
  • Embedder pool infrastructure on dedicated hardware (CueCrux-Data-1)
  • Overseer site updates and new writing

This phase is about proving what we've built holds up, documenting it honestly, and planning the next layer (agent memory) before building it.

Paused or deprioritised

Some things are deliberately on hold:

Data Quality Pipeline advanced features

Semantic chunking, HyDE, and quality gating were empirically tested and made the system worse. The negative results are published. The baseline stays until there is evidence that specific DQP features help specific categories without regressing others.

Multi-lane retrieval

Ablation proved multi-lane RRF dilutes quality. Feature flag is off in production. May revisit if corpus characteristics change.

Broad AI commentary

I'm avoiding generic takes on AI progress or capability races unless they connect directly to system behaviour and trust.

These may return later. They're paused because the evidence said to pause them.

How this is likely to change

Once MemoryCrux Phase 4 moves from planning to implementation, the focus will shift to agent effectiveness tooling and the constraint surface. The SCITT submission will open conversations about transparency service registration. And the writing will continue to follow the work.

  • MemoryCrux build: coverage assessment, context briefing, constraint suggestions
  • Decision plane resolution: unblocking CoreCrux read path for downstream consumers
  • External proof surfaces: decision chain proof pages visible beyond the platform boundary
  • Regulatory alignment: EU AI Act Article 13 transparency requirements (August 2026)

When that happens, this page will change.

Best place to follow changes

The writing remains the best place to track how the work evolves.

That's where uncertainty is worked through in public, before it shows up elsewhere.

Writing