Blog

Field notes on cheap, certified routing.

On the economics of running AI in production: why inference cost decides your margins, how to build cost discipline from day one, and how to route the repetitive traffic to near-free models while holding a parity guarantee. Written for founders and the engineers who own the bill.

Thesis · 2026-06-13

The AI margin problem: why scale should cut your cost per call

AI app margins get squeezed by per-request inference cost, and often worsen as you grow, which caps valuation. Here is how to invert the curve so cost per call falls as volume rises.

Read →

Playbook · 2026-06-13

A cost playbook for AI startups: strong model first, observability from day one

Find product-market fit on the strongest model, install observability from your first call even on free credits, then route the repetitive traffic to a near-free model. The traces are the asset.

Read →

Use case · 2026-06-13

How to cut LLM costs on customer support ticket triage

Support inboxes are the most repetitive LLM workload most teams run. Route the intents you can certify to a near-free classifier, defer the rest to the LLM, and cut the bill without losing accuracy.

Read →

Use case · 2026-06-13

How to cut LLM costs on sales lead qualification

Buyer-intent scoring runs an LLM over every reply. Score the clear-cut leads with a small classifier, defer the borderline ones, and cut the cost of qualifying the bulk of your pipeline.

Read →

Use case · 2026-06-13

How to cut LLM costs on AI agent tool selection

Agents spend most of their tokens deciding which tool to call next, and that decision repeats. Route it through a small classifier and cut end-to-end agent cost without changing the agent's behaviour.

Read →

Use case · 2026-06-13

How to cut LLM costs on email reply prioritization

Route the clear-cut replies to a small classifier, defer the nuanced ones, and cut the cost of prioritizing your inbox.

Read →

Use case · 2026-06-13

How to cut LLM costs on company segmentation and ICP tagging

Route the clear-cut companies to a small classifier, defer the ambiguous ones, and cut the cost of firmographic tagging.

Read →

Use case · 2026-06-13

How to cut LLM costs on news and content classification

Route the clear-cut items to a small classifier, defer the rest, and cut the cost of classifying a high-volume feed.

Read →

Use case · 2026-06-13

How to cut LLM costs on chatbot and assistant intent detection

Route the common intents to a small classifier, defer the rare and ambiguous ones, and cut the per-turn cost of your bot.

Read →

Use case · 2026-06-13

How to cut LLM costs on banking and fintech support

Route the fintech support intents you can certify to a small classifier, defer the rest, and cut the bill while holding accuracy.

Read →

Use case · 2026-06-13

How to cut LLM costs on content moderation

Route the clearly benign and clearly violating regions to a small classifier, keep the borderline cases on the LLM and human review.

Read →

Use case · 2026-06-13

How to cut LLM costs on e-commerce support and product questions

Route the repeat e-commerce questions to a small classifier, defer the complex ones, and cut the bill at peak.

Read →

Use case · 2026-06-13

How to cut LLM costs on question routing and FAQ deflection

Route the predictable questions with a small classifier, defer the novel ones, and cut the cost of the routing step.

Read →