Skip to content

Writing

Essays, notes, and occasional updates on AI, automation, trust, evidence, and the systems I’m building and questioning. The topics drift, but the through-line is simple: how answers behave once they’re reused, automated, and relied on at scale.

2026-03-27

The Metric No One Is Measuring

AI agents are benchmarked on reasoning, code generation, and task completion. But nobody measures whether they can maintain effective context across a real work session. We built a scoring framework to find out, and the test itself proved the point.

2026-03-25

Eight Doors Instead of One

VaultCrux has fifty tools behind a single MCP endpoint. We split them into eight scoped surfaces. What that means for agents, tokens, and the principle that access should match intent.

2026-03-25

Submitting CROWN to SCITT

Today I sent a proposal to the IETF SCITT working group introducing CROWN, a protocol for proving what evidence an AI system used to reach its answer. What we built, what we changed to align with the standard, and what we proved along the way.

2026-03-25

The Agent Memory Wall: Why Your Agents Fail at Jobs, Not Tasks

AI agents can execute tasks but can't hold jobs. The missing piece isn't capability — it's institutional memory. Why an external memory plane with constraints, provenance, and decision checkpointing is the architectural shift agents need.

2026-03-25

Twenty-Eight Tools and the Ones That Matter

The MemoryCrux MCP server exposes twenty-eight tools to every agent session. Most sessions need three. What happens when you optimise for the conversation that actually occurs instead of every conversation that could.

2026-03-24

It's All About the Context

What happens when your AI agent forgets everything between sessions, your master plans go stale, and your context window fills up with things that don't matter. A story about building CueCrux and learning, the hard way, that the real bottleneck was never intelligence.

2026-03-22

The Database That Remembers Why

What CoreCrux Projections are, how they differ from traditional databases, and why a GPU-backed append-only event spine quietly solves problems that most systems don't know they have.

2026-03-22

The Model You Don't Own

What happened after 13/13, from stabilisation to quality baseline to the realisation that the biggest variable in the system isn't ours to control.

2026-03-21

What Happens When You Hit Submit

The end-to-end journey of a query through CueCrux, from the moment you ask a question to the moment the system starts watching what it told you.

2026-03-20

Thirteen Out of Thirteen

The full story of testing CueCrux's retrieval engine, from four categories to thirteen, from clean text to adversarial hard-negatives, and why submitting CROWN to the SCITT working group feels like the right next step.

2026-03-20

Three Versions of the Same Question

How the CueCrux retrieval engine evolved across three major versions, why V4.1 is fundamentally different from anything else in the market, and what the benchmarks actually show.

2026-01-27

The Answer That Felt Right

A reflection on how fragile decisions pass as solid when assumptions stay invisible.

2026-01-27

The Answer That Felt Right

A reflection on how fragile decisions pass as solid when assumptions stay invisible.

2026-01-26

Silence Is Success: Why a Quiet Watch List Is the Best Outcome

Why CueCrux Watch is designed to stay quiet until something you rely on truly changes, and what counts as meaningful change.

2026-01-11

When Facts Freeze

Why many facts are just decisions with short time horizons, and what happens when we stop revisiting the conditions underneath.

2026-01-10

Why We Published the CueCrux Whitepapers (and Why Now)

Why the CueCrux whitepapers exist, why they’re being published now, and how to read them.

2026-01-08

How to Read The Shape of Knowing

A guide to reading the book as a diagnostic: slow, reflective, and focused on what answers depend on.

2026-01-07

When Confidence Starts Masquerading as Knowledge

How confident answers scale, assumptions disappear, and fragility hides inside certainty.

2026-01-05

An Update on CueCrux

A slightly more informal January update on CueCrux progress toward the MVP.

2026-01-03

When AI Gets January Wrong

January resets incentives; forecasts that travel without conditions become brittle.

2026-01-02

The Operator In Trust Role: How I’m Building CueCrux With an AI Board

Why CueCrux is built as an agency-driven system, and what operating in trust means when agents do the operational work.

2026-01-01

Why I Don’t Trust Answers That Arrive Too Smoothly

Why confidence scales faster than the conditions that make answers safe to reuse.