How to cut LLM costs on AI agent tool selection

An agent spends most of its budget on one repeated decision: given the current state, which tool do I call next. That choice carries the full tool list and context, so it is the most expensive token in the loop, and it repeats across runs. Here is how to route it cheaply.

Short answer

Tool selection is the most expensive and most repetitive step in an agent loop. Route the decisions you can certify against the agent's own history through a small classifier, defer the unfamiliar states to the LLM, and cut end-to-end agent cost while keeping the agent's behaviour intact.

Why tool selection dominates agent cost

Each step of an agent loop asks the model the same kind of question: here is the goal, the history, and the available tools, what is the next action. That prompt is large because it has to carry the tool schemas and the running context, and it fires on every step of every run. Across a workload, the agent re-derives the same routing decisions constantly. The reasoning is real, and most of it is repeat work.

Which tool-selection steps are safe to route

Collect the agent's past states and the tool it chose, then group by that choice and check consistency on held-out states. The states that map cleanly to one tool, a lookup that always goes to search, a date question that always goes to the calendar tool, form tight regions a classifier reproduces. The states where the agent legitimately branches stay on the LLM.

How much can you actually save

Because the routing step is both the costliest and the most repetitive part of the loop, certifying it pays off more than trimming any other stage. In our getclaw case study, rewiring the Hermes agent framework to route tool selection through a TRACER classifier cut end-to-end agent cost by around 50 percent. The agent kept its behaviour, the cheap path handled the routine decisions, and the LLM stayed in the loop for the rest.

StepBeforeAfter
Routine tool choiceFrontier LLM callSmall classifier, near-zero
Genuine fork or new stateFrontier LLM callFrontier LLM call (deferred)

How do you keep the agent correct

Each region of states carries a calibrated lower bound on how often the classifier matches the agent's own past tool choice, on held-out data. A region routes to the cheap path only when that bound clears your target. The rest defer to the LLM. You can inspect any region to see the dominant tool, real example states, and the error bound, so you can sign off on what the classifier is allowed to decide. For why this matters to the economics of an AI product, see the AI margin problem.

How to cut the cost, step by step

What you need: a few thousand logged agent steps, each with the state the agent saw and the tool it chose. No manual labelling. The agent's own trajectory is the training signal.

  1. Log your state-and-tool pairs

    Capture each agent step: the state and context the agent saw, and the tool it called next.

  2. Build the partition

    Run pip install tracer-llm and fit on the logged steps. TRACER groups states by the tool chosen, then learns which tool a new state maps to.

  3. Read the certified decisions

    See which routing decisions clear your target agreement on held-out states. Each region shows its dominant tool, real example states, and its error bound.

  4. Slot the classifier in front of tool selection

    Route the certified decisions through the small classifier and defer genuine forks and unfamiliar states to the LLM. The out-of-distribution gate handles anything new.

  5. Meter and re-certify

    Track live coverage, savings, and agreement. As the agent's workload evolves, re-fit so the guarantee keeps holding.

The open-source library runs locally and slots in front of the tool-selection call. The hosted version meters live traffic and activates a certified region in one click at app.tracerml.ai.

Frequently asked questions

Why is tool selection the expensive part of an agent?

Every step of an agent loop calls the LLM to decide what to do next, and that decision usually dominates the token count because it carries the full tool list and context. Most of those decisions repeat across runs, so you are paying a frontier model to re-derive the same routing choice again and again.

How much can routing tool selection save?

In the getclaw case study, rewiring the Hermes agent framework to route tool selection through a TRACER classifier cut end-to-end agent cost by around 50 percent, because the routing step is both the most expensive and the most repetitive part of the loop.

Does the agent still behave the same way?

Yes. The classifier only replaces the LLM on tool-selection decisions it can certify against the agent's own past behaviour. Anything ambiguous or unfamiliar defers to the LLM, so the agent's policy is unchanged on the hard steps.

What about a step the agent has not seen before?

It defers to the LLM. An out-of-distribution gate routes any state that does not resemble certified traffic back to the model, so a novel situation is handled by the full agent rather than a cheap guess.

TRACER is open source. Run pip install tracer-llm, point it at your agent's tool-selection traces, and see which decisions certify. The hosted version adds a live meter and one-click activation at app.tracerml.ai.

← All posts