Why Your AI Coding Agent Burns Tokens — and How to Stop the Bleed
Autonomous coding agents (OpenClaw, Cline, Aider, Claude Code, Cursor) burn 5-50x more tokens than chat. Here's the architecture reason — and the cheapest fix that doesn't require giving up the agent.
The numbers do not feel right
The first time I sat down with Claude Code on a real project, my Anthropic dashboard showed $11.20 spent in 47 minutes. That's roughly $14/hour just to write code. The model wasn't slow. The agent wasn't broken. It was just doing what agents do.
If you've used Cline, OpenClaw, Aider, Continue, Cursor, or any other agent on a real codebase, you've seen the same thing. Chat is cheap. Agents are not. Why?
Five reasons, in order of impact
1. Context grows quadratically
A chat is a conversation: you send a prompt, get a reply, send another. Roughly linear cost.
An agent does not work like that. Each turn, the agent re-sends the entire history — your task, the plan, every file it has read so far, every tool result, every reasoning chunk. By turn 20, the input is 50k tokens. By turn 30, it's 100k. Each turn pays the full cost of every prior turn.
Math: a 30-turn task with 5k tokens added per turn averages ~75k input tokens per turn over the run. At Claude Opus rates ($5/M), that's $0.38 per turn × 30 turns = $11.25 per task. And that's just input.
2. Agents read files. Whole ones.
Aider's classic edit format is *whole-file* — the agent re-sends the entire file in every diff message. Cline reads a file, then sends it again on every plan revision. Claude Code's Read tool returns up to 2000 lines at a time. A 500-line file is ~5000 tokens. Read it twenty times in a session and that's 100k tokens just from one file.
This is by design — the agent needs the up-to-date content to reason about edits. There's no shortcut without breaking accuracy.
3. Browser snapshots are huge
Agents that control browsers (OpenClaw with Playwright, Cursor's web tab, Cline's web preview) take page snapshots and send them to the model. A typical webpage screenshot at 1920×1080 is 30-80k tokens after vision encoding. Multiply by every navigation step.
4. Tool calls multiply turns
Every Read, Edit, Bash, Search, WebFetch is its own model round-trip. The model returns a tool call, your client executes it, returns the result, the model re-reads everything and decides the next step. A simple "add a function and run the tests" task can be 8-12 tool calls, each a full turn with full context.
5. Reasoning models think in hidden tokens
Claude Opus 4.7's extended thinking, Gemini Pro's CoT, GPT-5's o1-style reasoning — they consume output tokens before producing the answer you see. A 500-word reply might require 2000 tokens of internal reasoning first. You pay for both.
So what do you do
The instinct is to throttle: switch from Opus to Sonnet, switch from Sonnet to Haiku, set max-token caps, write shorter prompts.
This works, but it is fighting the agent's nature. Agents are good *because* they get full context. Cripple the context and the agent gets dumber — and you waste turns on bad outputs.
There is a much simpler lever: change the price per token, not the token count.
The cheapest fix
Swap the API endpoint. Every major agent (OpenClaw, Cline, Aider, Cursor, Continue, Claude Code, OpenCode, OpenHands, Goose, Codex CLI) supports a custom base URL on its API provider. Point at claudeapi.cheap, use your sk-cc-... key, and the same model behavior costs 70-80% less. Your agent's context-grows-quadratically problem is now a 5x-cheaper version of the same problem.
Compare for the 30-turn task above:
| Endpoint | Input per 1M | 30-turn Opus task |
|---|---|---|
| Anthropic direct | $5.00 | $11.25 |
| claudeapi.cheap Basic (free) | $1.50 | $3.38 |
| claudeapi.cheap Pro ($19 lifetime) | $1.00 | $2.25 |
No agent code changes. No prompt rewriting. No model downgrades. The agent gets the same context, makes the same decisions, ships the same output. The bill is 80% smaller.
The smart combo
You can stack savings. After switching endpoints, get more efficient on top:
Do all three and you can run an agent for $20-50/month that would cost $300-500/month direct.
TL;DR
Agents burn tokens because they re-send full context every turn. You can't stop the architecture without weakening the agent. You *can* change what each token costs.
Pick a tool, grab a free key, and your existing agent gets 80% cheaper tonight.