getclaw (getclaw.sh) is an agentic product running on Hermes, an open-source agent framework. Inside agentic workflows, the LLM makes many discrete tool-call decisions per task, which tool to use, which file to read, which action to take next.

How much agent cost did getclaw save with TRACER?

End-to-end agent cost dropped about 50% with no degradation on the measured traces. At getclaw, tool-selection calls were the dominant cost line of the agent loop. Cost-mix varies across agent harnesses, but tool selection is a very common offender.

Why is agent tool selection a classification problem?

Once an agent has run for a while, the same tool-call decisions repeat: 'is this a read, a search, or a write?', 'should I use the file tool or the shell?', 'do I need to ask a follow-up?'. These are discrete choices over a small action space, the textbook definition of classification. The LLM is overqualified for them, and TRACER replaces the predictable ones with a local ML model. Note that this isn't universal across agent frameworks: depending on the harness, the dominant cost can also live in reasoning, retrieval, or generation. Always measure first.

How does TRACER plug into an agent framework?

Integration depends on the framework. For Hermes we shipped a native plugin registered through Hermes's own plugin interface, not a wrapper. For other agent stacks (LangGraph, CrewAI, custom loops) TRACER ships as an OpenAI-compatible HTTP endpoint or an in-process Python handle. In every case the routing shape is the same: TRACER returns a tool name plus a confidence, and defers back to the LLM below the parity threshold. No agent rewrite required.

Choose · Agentic · 2026 getclaw.sh

Tool selection is classification in disguise. So we replaced it.

getclaw runs an agent built on Hermes (an open-source agent framework). For their workload, tool-selection calls were the dominant cost line. We shipped a native Hermes plugin that routes those calls through a TRACER classifier instead of the LLM. End-to-end agent cost dropped ~50% with no degradation. Cost-mix varies by harness; tool-selection is a very common offender.

E2E agent cost −50% end-to-end, in production
measured on real traces

Quality delta 0 no degradation on the
measured traces

Integration 1 plugin native Hermes plugin
no agent rewrite

Hermes agent loop with TRACER tool-selection routing, animation showing the agent's tool-call decisions handled by a local classifier instead of an LLM call. — Hermes + TRACER · tool selection routed through a local classifier · agent loop unchanged

The hidden cost - at getclaw

In Hermes, decisions dominated the bill.

When we instrumented getclaw's loop call-by-call, tool-selection calls were the largest cost line, not the user-facing generation. This isn't universal across agent harnesses, the mix varies with how each framework structures reasoning, retrieval, and generation. But it was the dominant pattern here, and it's a very common one.

Step 1 "Which tool should I use?" LLM call. Output: one of N tools. Pure classification over a small action space.

Step 2 "Do I have what I need?" LLM call. Output: yes/no/needs-clarification. Three-class triage decision.

Step 3 "Should I escalate?" LLM call. Output: continue/handoff/abort. Routing over the next-action space.

Step N "Is this done?" LLM call. Output: done/loop. Binary classifier dressed up as reasoning.

Multiply by the number of steps per task, the number of tasks per day, and the price per token. Tool-selection is where agent inference budgets quietly bleed out, and where the same kinds of decisions repeat thousands of times.

The integration

A native Hermes plugin, registered through Hermes's own interface

Not a wrapper. We built TRACER as a native plugin in Hermes's plugin interface, the same surface Hermes uses for its own internals. Where the framework previously called an LLM to pick the next tool, it now calls the TRACER plugin first, and falls back to the LLM only below the parity threshold:

# before - LLM picks the tool
tool_name = llm.complete(
    prompt=tool_selection_prompt(state),
    tools=available_tools,
).tool

# after - Hermes plugin routes through TRACER first, defers hard cases
decision = hermes.plugins.tracer.route(state)
if decision.confidence >= 0.95:
    tool_name = decision.tool      # local classifier · ~free
else:
    tool_name = llm.complete(...).tool   # defer to teacher LLM

Same classifier core, same parity gate. The agent loop is unchanged. The LLM is still in the stack, it just stops handling decisions a small ML model can handle.

Why it compounds

Agents are the perfect TRACER workload

Three properties make agentic tool selection unusually well-suited:

Repetition The same decisions, again and again An agent doing the same kind of task hits the same tool-selection states thousands of times.

Small action space N is tiny Most agents have fewer than 20 tools. A classifier over 20 classes is a solved problem.

Structured output Tool name + arguments Discrete, bounded, never a paragraph. Exactly what a small ML model is built for.

Already labelled The LLM's own past decisions Every tool call your agent has ever made is a teacher-labelled (state, action) pair. No manual labelling.

The result

Same agent. Same quality. Half the cost.

On the measured traces, end-to-end agent cost dropped ~50% with no observable degradation. The savings compound: every deferred LLM call becomes a new (state, action) trace that retrains the classifier and lets it handle the next case locally.

"Tool selection turned out to be where most of our budget was going. Tracer let us treat it like the classification problem it always was."

Different framework? Different integration surface.

For Hermes we shipped a native plugin. For other agent stacks (LangGraph, CrewAI, custom loops) TRACER ships as an OpenAI-compatible endpoint or an in-process Python handle. Whatever your agent uses to pick the next tool, the routing shape is the same. Start by measuring which calls actually dominate your bill.

Try the hosted version → View the OSS repo All case studies

Adam Rida Founder · TRACER · github.com/adrida