How to cut LLM costs on news and content classification

Media monitoring and trading signals run an LLM over every article or headline to classify topic, relevance, or intent. Feeds are high-volume and never stop, so the cost scales with the stream. Here is how to classify the clear-cut items with a near-free classifier and keep the LLM for the genuinely novel ones.

Short answer

News and content classification is high-volume and repetitive. Route the clear-cut items through a small classifier you have certified against your own LLM, defer the novel or ambiguous ones, and you cut the cost of classifying the bulk of your feed while holding accuracy against the teacher.

Why classifying every item with an LLM gets expensive

A monitoring or signal pipeline calls a language model on each item to assign a topic or a relevance label. A clear market-moving headline is one call. A clear off-topic item is another. The model is doing real work, and most of that work repeats as the same kinds of stories cycle through the feed. On a continuous feed, the classification step alone is a standing cost that grows with volume.

Which items are safe to classify with a cheap model

Group your past items by the label your LLM assigned, then check how consistent each group is on held-out items. The clear topics and obvious relevance calls form tight regions a small model reproduces. The breaking, the ambiguous, and the genuinely novel stay on the LLM.

How much can you actually save

The savings equal the share of traffic you can certify, times the price gap between your teacher model and a small classifier. The classifier cost is close to zero next to a frontier call, so the certified share is the number that matters. The reference point on this site is the Obside case study, the exact shape of this problem, news classified by intent for automated trading, where a 38-cell surrogate replaced one frontier call per item at 95 percent saved and 99.9 percent routed accuracy.

ItemBeforeAfter
Certified itemFrontier LLM callSmall classifier, near-zero
Rare or ambiguousFrontier LLM callFrontier LLM call (deferred)

How do you prove quality holds

Each region carries a calibrated lower bound on how often the cheap path will match the teacher, computed on held-out items. A region only routes to the small model when that bound clears the target you set. Everything else defers. You get an audit trail per region: the dominant label, real examples, and the error bound, so a desk or research lead can see why a region is safe before any real traffic moves. For why this matters to your unit economics, see the AI margin problem.

How to cut the cost, step by step

What you need: a few thousand recent items, each paired with the label your LLM already produced. No hand-labelling. Your own traffic is the training signal.

  1. Export your classification traces

    Pull recent items paired with the label your LLM produced: topic, relevance, or intent.

  2. Build the partition

    Run pip install tracer-llm and fit on your traces. TRACER groups items by the label the LLM assigned, then learns where a new item lands.

  3. Read the certified labels

    See which labels clear your target agreement on held-out items. Each region shows its dominant label, real example items, and its error bound.

  4. Activate the clear-cut labels

    Route the certified labels to the small classifier and keep breaking or ambiguous items on the LLM. The out-of-distribution gate sends novel items back to the teacher model.

  5. Meter and re-certify

    Track live coverage, savings, and agreement. As the news cycle shifts, re-fit so the guarantee keeps holding.

The open-source library runs the whole flow locally. The hosted version adds a live meter, a savings estimate, and one-click activation once a region is certified, at app.tracerml.ai.

Frequently asked questions

Can a small model classify news as well as an LLM?

For the clear-cut items, yes. An obvious topic or relevance call is handled by a small classifier for near-zero cost, in single-digit milliseconds. The breaking or ambiguous items defer to the LLM. You only route an item to the cheap model when a calibrated accuracy bound clears your target.

How much does news classification with an LLM cost at scale?

A continuous feed triggers a call on every item, all day. The classification step becomes a meaningful and constant line item. Moving the clear-cut share to a classifier removes that cost from the bulk of the stream, and the cheap path is fast enough for latency-sensitive pipelines.

Is this proven in production?

Yes. Obside runs exactly this workload, news classified by intent for automated trading, on a TRACER surrogate. A 38-cell surrogate replaced one frontier call per item at 95 percent saved and 99.9 percent routed accuracy.

What happens to a breaking or unusual story?

It defers to the LLM. An out-of-distribution gate routes anything that does not resemble certified traffic back to the teacher model, so a novel story is never classified by the cheap path on a guess.

TRACER is open source. Run pip install tracer-llm, point it at your traces, and see which items certify. The hosted version adds a live meter and one-click activation at app.tracerml.ai.

← All posts