Modern Pricing & UBB

The Token Is a Terrible Pricing Metric (And Everyone Knows It)

Per-token pricing for AI APIs made perfect sense in 2022. You needed a unit to charge for. The unit had to be measurable. Tokens were what the model processed. Done. Ship it.

The problem is that pricing decisions made for developer API launches tend to calcify into the foundation of a business model, and per-token pricing is now the default assumption for most enterprise AI negotiations. Enterprise buyers are asking for volume discounts on tokens. Finance teams are building token consumption forecasts. And everyone knows, in the back of their minds, that tokens are a terrible proxy for the value they're actually getting from AI.

What's Wrong With Tokens as a Value Metric

Let's be specific about the failure modes.

Tokens don't correspond to user-perceived value. A 50-token response that answers a question perfectly is worth more than a 500-token response that rambles to an inconclusive answer. A 10-token classification response ("positive") that routes a customer support ticket correctly is more valuable than a 2,000-token essay that nobody reads. Token count is a measure of compute consumption, not a measure of outcome delivered. The value metric for usage-based pricing should scale with value. Tokens scale with verbosity.

Per-token pricing creates adversarial optimization. Pricepertoken.com tracks the per-token rates across major models and the trend is one-directional: dramatically cheaper over time. But more importantly, per-token pricing gives customers a direct financial incentive to compress prompts, reduce generation length, and engineer around the cost of the LLM. A developer building on Claude or GPT-4 who's paying by the token will write system prompts that are as short as possible — which is often not the prompt that generates the best outputs. You've created a pricing model that makes your product less effective for budget-conscious users.

Tokens are meaningless to enterprise buyers. The CFO of a 500-person company does not understand what a token is. They cannot sanity-check a $40,000/year token budget. They can understand "AI handles X customer support tickets per month" or "AI generates X content pieces per week." The enterprise sales cycle for per-token AI pricing always includes an education tax — explaining what tokens are and why the customer should care — that per-outcome pricing eliminates entirely.

What Replacements Actually Look Like

The market is converging on several alternatives, each with tradeoffs:

Per-task or per-request

Instead of charging by token, charge by completed task: per document analyzed, per email drafted, per image generated, per query answered. This is the model Anthropic is experimenting with for enterprise Claude applications and what most AI middleware companies (Writer, Jasper, Copy.ai) have adopted for end-user products. It's legible, it scales with usage in a way customers intuitively understand, and it removes the prompt-compression incentive. The challenge: "task" needs to be clearly defined in the contract, and task complexity can vary wildly.

Per-outcome

Charge for resolved tickets, closed support cases, completed code reviews, generated leads that enter pipeline. This is outcome-based pricing and it's both the most aligned model and the hardest to implement. The alignment is perfect — you only make money when the customer gets value. The challenge is measurement and attribution: what counts as a "resolved" ticket? What if the AI answered but the customer wasn't satisfied? Outcome-based pricing requires shared agreement on success metrics, which adds contract complexity and opens surface area for disputes.

Seats + AI allocation

Some companies are solving the problem by treating AI usage as a component of a seat-based model: $X per seat per month includes AI allocations appropriate for that user profile, with overage at a per-task or per-token rate. This is pragmatic: it preserves procurement's ability to budget for known headcount while adding AI capacity without requiring a fundamentally new pricing conversation. It's a transitional model — good for today, probably replaced by pure outcome pricing as measurement matures.

a16z's AI pricing research is explicit: per-token pricing is a developer convenience that hasn't yet been replaced by something better for enterprise, not a deliberate optimal choice. The companies that figure out per-outcome pricing at scale for enterprise workflows will have a significant competitive advantage in both winning deals and generating better unit economics. Orb has documented the infrastructure requirements for per-task billing — it's more complex than per-token, but it's not unreachable.

The token will be look back on as the early pricing unit of the AI era — the equivalent of "per MB of data" in early cloud pricing. It served its purpose. Something better is coming.


Sources

← AI & Agentic Pricing · All posts