How to cut LLM costs on AI agent tool selection
An agent spends most of its budget on one repeated decision: given the current state, which tool do I call next. That choice carries the full tool list and context, so it is the most expensive token in the loop, and it repeats across runs. Here is how to route it cheaply.
Short answer
Tool selection is the most expensive and most repetitive step in an agent loop. Route the decisions you can certify against the agent's own history through a small classifier, defer the unfamiliar states to the LLM, and cut end-to-end agent cost while keeping the agent's behaviour intact.
Why tool selection dominates agent cost
Each step of an agent loop asks the model the same kind of question: here is the goal, the history, and the available tools, what is the next action. That prompt is large because it has to carry the tool schemas and the running context, and it fires on every step of every run. Across a workload, the agent re-derives the same routing decisions constantly. The reasoning is real, and most of it is repeat work.
Which tool-selection steps are safe to route
Collect the agent's past states and the tool it chose, then group by that choice and check consistency on held-out states. The states that map cleanly to one tool, a lookup that always goes to search, a date question that always goes to the calendar tool, form tight regions a classifier reproduces. The states where the agent legitimately branches stay on the LLM.
- Safe to route: states that map to a single tool almost every time across runs.
- Keep on the LLM: genuine forks, multi-tool plans, and any state that looks unlike the agent's normal trajectory.
How much can you actually save
Because the routing step is both the costliest and the most repetitive part of the loop, certifying it pays off more than trimming any other stage. In our getclaw case study, rewiring the Hermes agent framework to route tool selection through a TRACER classifier cut end-to-end agent cost by around 50 percent. The agent kept its behaviour, the cheap path handled the routine decisions, and the LLM stayed in the loop for the rest.
| Step | Before | After |
|---|---|---|
| Routine tool choice | Frontier LLM call | Small classifier, near-zero |
| Genuine fork or new state | Frontier LLM call | Frontier LLM call (deferred) |
How do you keep the agent correct
Each region of states carries a calibrated lower bound on how often the classifier matches the agent's own past tool choice, on held-out data. A region routes to the cheap path only when that bound clears your target. The rest defer to the LLM. You can inspect any region to see the dominant tool, real example states, and the error bound, so you can sign off on what the classifier is allowed to decide. For why this matters to the economics of an AI product, see the AI margin problem.
How to cut the cost, step by step
What you need: a few thousand logged agent steps, each with the state the agent saw and the tool it chose. No manual labelling. The agent's own trajectory is the training signal.
-
Log your state-and-tool pairs
Capture each agent step: the state and context the agent saw, and the tool it called next.
-
Build the partition
Run
pip install tracer-llmand fit on the logged steps. TRACER groups states by the tool chosen, then learns which tool a new state maps to. -
Read the certified decisions
See which routing decisions clear your target agreement on held-out states. Each region shows its dominant tool, real example states, and its error bound.
-
Slot the classifier in front of tool selection
Route the certified decisions through the small classifier and defer genuine forks and unfamiliar states to the LLM. The out-of-distribution gate handles anything new.
-
Meter and re-certify
Track live coverage, savings, and agreement. As the agent's workload evolves, re-fit so the guarantee keeps holding.
The open-source library runs locally and slots in front of the tool-selection call. The hosted version meters live traffic and activates a certified region in one click at app.tracerml.ai.
Frequently asked questions
Why is tool selection the expensive part of an agent?
Every step of an agent loop calls the LLM to decide what to do next, and that decision usually dominates the token count because it carries the full tool list and context. Most of those decisions repeat across runs, so you are paying a frontier model to re-derive the same routing choice again and again.
How much can routing tool selection save?
In the getclaw case study, rewiring the Hermes agent framework to route tool selection through a TRACER classifier cut end-to-end agent cost by around 50 percent, because the routing step is both the most expensive and the most repetitive part of the loop.
Does the agent still behave the same way?
Yes. The classifier only replaces the LLM on tool-selection decisions it can certify against the agent's own past behaviour. Anything ambiguous or unfamiliar defers to the LLM, so the agent's policy is unchanged on the hard steps.
What about a step the agent has not seen before?
It defers to the LLM. An out-of-distribution gate routes any state that does not resemble certified traffic back to the model, so a novel situation is handled by the full agent rather than a cheap guess.
TRACER is open source. Run pip install tracer-llm, point it
at your agent's tool-selection traces, and see which decisions certify.
The hosted version adds a live meter and one-click activation at
app.tracerml.ai.