Why AI agents loop forever — and 4 control patterns that work
The most expensive bug in a production agent isn't a logic bug — it's an infinite loop. The agent gets a task, calls tools, gets results, calls again, gets results — and can't exit. The model thinks it's working. It's actually burning $40 in ten minutes, rewriting the same file for the thirtieth time, getting nowhere.
I once saw a team's $1,200 overnight bill — an agent looped on "fix the tests" until the rate limit saved them at dawn. Here's why it happens and how to install circuit breakers.
Why agents loop
Three common causes.
Fuzzy termination. The agent doesn't know what "done" means. A task like "fix the tests" with no success criterion becomes endless improvement. The model works honestly until it hits a quota.
A tool that returns contradictory results. The agent runs grep, gets a list, edits, runs grep again, sees the same matches (because the edit didn't land), edits more. Without a "did state change?" check between iterations — the loop is eternal.
The model forgets what it already tried. On long chains the context drifts, early steps get forgotten, and the agent repeats actions it tried 20 steps ago. This especially hits Sonnet on tasks > 30 tool-calls.
Pattern 1: hard iteration limit
The simplest circuit breaker. Count tool-calls, stop after N with "couldn't finish in N steps."
const MAX_STEPS = 20;
let steps = 0;
while (needsMoreWork) {
if (++steps > MAX_STEPS) {
return { status: "exhausted", reason: "Max steps reached" };
}
const resp = await client.messages.create({...});
// handle tools
}
Picking N: 10 for simple agents, 30-50 for complex ones. Higher is almost always a symptom of bad task decomposition.
Pattern 2: token and dollar budget
Iterations are a poor metric if each one chews 50k tokens. Better: count tokens directly.
let totalInputTokens = 0;
const TOKEN_BUDGET = 200_000;
// after each response
totalInputTokens += resp.usage.input_tokens;
if (totalInputTokens > TOKEN_BUDGET) {
return { status: "budget_exhausted" };
}
In production I always add a dollar budget per session on top. Translate tokens to $ at current Claude pricing, and hard-stop on overage. This saves you from overnight surprises.
Pattern 3: repetition detector
The most useful pattern for real agents. Before every tool-call, check: did we already do this?
const history = new Map<string, number>();
function shouldAllow(toolName: string, args: object) {
const key = `${toolName}:${JSON.stringify(args)}`;
const count = history.get(key) ?? 0;
history.set(key, count + 1);
if (count >= 3) {
return { allow: false, reason: "Tool called with same args 3 times" };
}
return { allow: true };
}
If the agent calls read_file("config.json") three times with identical args — it's clearly not moving. Feed this back as a message ("you already read this file, result: ...") and the model usually breaks out.
Pattern 4: structured completion
Force the model to explicitly declare the task done. Add a special finish(summary) tool and state clearly in the system prompt: call it when done.
Available tools: read_file, write_file, run_tests, finish.
When the task is complete — call finish with a short report.
Without finish the result is not counted.
The model starts thinking in terms of "when can I call finish." This sharply reduces hangs. Bonus: you get a structured report instead of free-form text.
The fifth thing: observability
Everything above fails if you can't see what's happening. Minimum for a production agent:
- Log every tool_call with args and timestamp
- Log every model "thought" (thinking block, if you use it)
- Metric: steps per task (p50, p95, p99)
- Alert: session > 30 steps or > $5 — red flag
Without this you learn about a hang when the monthly bill arrives 10x higher than expected.
Pre-production checklist
- MAX_STEPS is set
- TOKEN_BUDGET is set
- Repetition detector is wired
- Explicit finish tool exists
- All tool-calls are logged
- Alerting on long sessions
- Ran 10 real tasks in sandbox, p95 < 15 steps
If the list isn't fully ticked — don't ship. Cheaper to finish the work than to reconcile where $800 went.
Need help shipping a production agent with proper circuit breakers? Get in touch. Part of the "Claude automation" package.