Twenty-Eight Tools and the Ones That Matter

The MemoryCrux MCP server, the one that bridges AI agents to organisational memory, exposes twenty-eight tools. Every one of them exists for a reason. And every one of them costs tokens just by existing.

When a language model connects to an MCP server, it doesn't load tools on demand. It loads the full manifest: every tool name, every description, every parameter schema. The model needs this information to decide which tool to call, in the same way you need to see the menu before you can order. But unlike a restaurant, the menu itself has a cost. Every tool definition consumes context window space. Every parameter schema, every description string, every enum value, it all compiles into tokens that sit in the window for the entire session, whether the tool is used or not.

I measured it. The full MemoryCrux tool manifest, twenty-eight tools with their Zod-validated schemas, compiles to roughly 4,200 tokens. That's the cost of the menu, before the agent has asked a single question.

For context, the entire task-relevant briefing from get_relevant_context typically runs 800 to 2,000 tokens. The menu is twice the size of the meal.

What agents actually use

I've been watching tool call patterns across sessions for the past month. Here's what I found.

In a typical coding session on the retrieval engine, the agent calls three to five MemoryCrux tools: get_relevant_context at session start, check_constraints before mutations, occasionally query_memory for a specific lookup, and checkpoint_decision_state at natural breakpoints.

In an audit session, the pattern shifts: assess_coverage to identify blind spots, get_freshness_report to check for stale context, check_claim to verify assumptions against the knowledge base.

In a planning session, it's get_decision_context and get_causal_chain to understand what was decided before, get_contradictions to find conflicting information, and declare_constraint when a boundary is discovered.

Three to five tools per session type. Out of twenty-eight. The other twenty-three sit in the context window, consuming tokens and adding noise to tool selection.

The tool selection problem

This isn't just a token efficiency issue. It's an accuracy issue.

When a language model faces twenty-eight tool options, it has to discriminate between tools with overlapping names and adjacent purposes. query_memory and get_relevant_context both retrieve knowledge. check_constraints and verify_before_acting both validate actions. get_decision_context and reconstruct_knowledge_state both look backward at decisions.

The distinctions are real and meaningful to the system. query_memory does keyword-weighted retrieval across the full knowledge base. get_relevant_context does task-scoped, budget-aware retrieval ranked by risk-if-missed. They solve different problems. But the model has to parse the descriptions and parameter schemas to understand that distinction, and it has to do it correctly every time.

With twenty-eight tools in the manifest, the model occasionally picks the wrong one. It calls query_memory when get_relevant_context would have given a better result. It calls get_decision_context when get_checkpoints would have been faster. These aren't catastrophic errors. But they cost time, they sometimes cost a follow-up call to get the right data, and they erode the precision that the whole system is designed to provide.

The shape of the solution

The answer isn't to remove tools. Every one of the twenty-eight exists because a real session needed it. The answer is to stop treating the MCP connection as a monolith.

What I'm working toward is session-typed tool profiles. When an agent connects to MemoryCrux, it declares what kind of session it's starting: coding, audit, planning, review, or exploration. The MCP server responds with a filtered tool manifest, only the tools that session type typically needs, plus a meta-tool that can request additional tools if the session evolves.

A coding session gets seven tools: get_relevant_context, check_constraints, verify_before_acting, query_memory, checkpoint_decision_state, suggest_constraint, and escalate_with_context. That's the working set for an agent that's reading code, making changes, and validating its actions against organisational boundaries.

An audit session gets eight tools: the coding set minus suggest_constraint, plus assess_coverage, get_freshness_report, check_claim, and get_contradictions. That's the investigative toolkit.

A planning session gets ten: get_decision_context, get_causal_chain, reconstruct_knowledge_state, get_correction_chain, get_decisions_on_stale_context, query_memory, list_topics, get_contradictions, declare_constraint, and checkpoint_decision_state.

The numbers: a coding profile compiles to roughly 1,100 tokens. An audit profile to about 1,400. A planning profile to about 1,700. All of them are less than half the cost of the full manifest. And in each case, the model is choosing between seven to ten tools instead of twenty-eight, which means tool selection accuracy goes up because there's less ambiguity to resolve.

The meta-tool escape hatch

The risk of profiling is that you guess wrong. An agent starts a coding session and then needs to check the causal chain because it found a decision it doesn't understand. If get_causal_chain isn't in the manifest, the agent is stuck.

The escape hatch is a single meta-tool, request_tools, that's always present in every profile. The agent calls it with a natural language description of what it needs: "I need to trace the causal chain of a decision." The meta-tool looks up the full registry, finds the matching tool, and dynamically adds it to the session. The token cost is paid only if the tool is actually needed.

This is the key insight: pay for the common case statically and the uncommon case dynamically. Most sessions stay within their profile. The few that don't pay a small additional cost for the tools they actually need, rather than every session paying for every tool that any session might need.

What changes in practice

The token savings are real but not dramatic. Saving 2,500 to 3,000 tokens per session matters most in long sessions where the context window is already under pressure, which is exactly the scenario where MemoryCrux is most valuable. The agent that's deep into a complex debugging session, with thousands of tokens of code and error traces in the window, is the agent that most needs its memory tools, and the agent that can least afford to waste context on tool descriptions it will never use.

The accuracy improvement is harder to measure but easier to feel. An agent choosing between seven clearly distinct tools makes better decisions than an agent choosing between twenty-eight overlapping ones. The descriptions can be shorter and more specific because they don't need to disambiguate against as many neighbours. The parameter schemas can include better defaults because the session type provides implicit context.

And the time savings compound. Fewer tools means faster tool selection. Better tool selection means fewer wasted calls. Fewer wasted calls means more of the session spent on actual work.

The honest part

I don't have production numbers yet. This is architecture, not evidence. The session profiles are designed based on observed patterns, but observed patterns from a single user (me) working on a single platform (CueCrux). Whether these profiles generalise to other users and other knowledge bases is an open question.

The meta-tool escape hatch adds complexity. Every dynamic tool addition is a round trip. If an agent needs three tools that aren't in its profile, that's three round trips before it can do its actual work. There's a threshold where the overhead of dynamic loading exceeds the savings from profiling, and I don't yet know where that threshold sits.

What I do know is that the current approach, loading twenty-eight tools into every session regardless of what that session needs, is measurably wasteful. It wastes tokens, it wastes time, and it degrades tool selection accuracy. The question isn't whether to optimise. It's how aggressively.

For now, the answer is: conservatively. Start with three profiles. Measure tool call patterns across a month of real sessions. Adjust the profiles based on data, not intuition. And keep the escape hatch wide enough that no session is ever stuck without the tool it needs.

Twenty-eight tools. Three to five per session. The infrastructure to serve the right ones at the right time is the difference between a context window that works for the agent and a context window that works against it.