The 5 AI Agent Failure Modes.

Why This Guide Exists

AI agents don't fail like LLMs. An LLM gives a wrong answer. One bad response. An agent takes a wrong action, then compounds it with the next action. It loops, corrupts its own state, and drifts from its objective. The blast radius grows for hours or days before anyone notices.

Every agent failure we've analyzed maps to one of five structural failure modes. This guide defines each one, shows you real incidents, gives you detection signals, and tells you what infrastructure prevents it.

The Five Failure Modes at a Glance

#	Failure Mode	One-Line Description	Real Incident	What's Missing
1	Context Blindness	Agent acts on wrong, stale, or fabricated context	Air Canada chatbot invents bereavement policy: $812 tribunal ruling	Context validation, freshness checks, policy guardrails
2	Memory Corruption	Agent mixes data between users or sessions	ChatGPT leaks billing info between users via Redis cache bug	Session isolation, state management
3	Rogue Actions	Agent takes confidently wrong real-world actions	McDonald's AI orders 260 McNuggets on a single order	Business logic guardrails, action validation, permission boundaries
4	Runaway Execution	Agent enters loop or unbounded execution chain	Autonomous coding agent burns $47K in 11 days, zero alerts	Loop detection, cost bounds, step limits
5	Silent Degradation	Agent quality erodes gradually without triggering alerts	Klarna's AI drops 22% in satisfaction. Nobody notices for months.	Continuous evaluation, drift detection, baseline monitoring

Mode 1: Context Blindness

What happens: The agent operates on wrong, missing, or fabricated context, and acts on it with full confidence.

Why it's dangerous: The agent doesn't know what it doesn't know. It fills gaps with plausible-sounding fabrications and takes actions based on them. This is the most legally dangerous failure mode.

Real incident: Air Canada's chatbot told a customer he could apply for a bereavement discount retroactively. The actual policy said the opposite. The customer relied on it, got denied, and won a tribunal ruling. The tribunal said: "It does not matter whether the information comes from a static page or a chatbot."

Detection signals:

Agent confidently answers questions outside its verified knowledge
Responses reference outdated policies, prices, or facts
No fallback to "I don't know" or human handoff

What prevents it: Context grounding verification. Freshness checks on injected data. Policy guardrails that restrict the agent to verified information.

Mode 2: Memory Corruption

What happens: Session boundaries leak. Context from one user bleeds into another's conversation. The agent serves personalized responses to the wrong person.

Why it's dangerous: This isn't a model problem. It's an infrastructure problem: shared state where there should be strict per-session isolation. In healthcare, it's an immediate HIPAA violation. In finance, it's a regulatory incident.

Real incident: A bug in OpenAI's Redis caching library caused ChatGPT to leak chat histories and billing information (names, emails, last four credit card digits) between users.

Detection signals:

User-specific data appears in other users' sessions
Agent references context from previous conversations it shouldn't have
Load testing reveals state leakage under concurrent connections

What prevents it: Strict session isolation. Per-user state management. Concurrent load testing before deployment.

Mode 3: Rogue Actions

What happens: The agent takes real-world actions with full confidence. The wrong ones.

Why it's dangerous: The system reports success. Every API call returns 200. The business outcome is wrong, and by the time someone checks, the action has already happened.

Real incident: McDonald's AI drive-thru system added 260 Chicken McNuggets to a single order. No quantity limits, no sanity checks. McDonald's ended the partnership after reaching only 85% accuracy. At ~6.5 million daily orders, that's ~975,000 wrong orders per day.

Detection signals:

Agent actions pass validation but give absurd results
Agent makes promises or commitments outside its authority
Users can steer the agent through prompt injection

What prevents it: Business logic guardrails. Quantity limits, price floors, permission boundaries. Pre-execution checks against known business rules.

Mode 4: Runaway Execution

What happens: The agent enters a loop or unbounded execution chain and never stops.

Why it's dangerous: The agent isn't malfunctioning by its own measure: it's doing exactly what it was designed to do. HTTP 200. No errors. Costs compound for hours or days before anyone checks the billing dashboard.

Real incident: A schema drift edge case put four LangChain agents into a recursive retry spiral. Every message returned HTTP 200. No alerts fired. The agents ran for eleven days: $47,000 in API costs before anyone noticed.

Detection signals:

API costs rising without corresponding output
Near-identical actions repeating with slight variations
Execution time far exceeding baseline

What prevents it: Loop detection (3 iterations). Cost bounds ($10/execution). Step limits (100 steps). Duration limits. Circuit breakers.

Mode 5: Silent Degradation

What happens: Agent quality declines gradually. No errors, no alerts, no clear incident. Just worse outcomes, slowly.

Why it's dangerous: Because nothing breaks. Teams blame seasonality, "AI being AI," or user behavior. By the time the drop is undeniable, months of damage have already compounded.

Real incident: Klarna's AI assistant handled the work of 700 full-time agents. Then satisfaction dropped sharply. Complex issues looped without resolution. By early 2025, the CEO publicly admitted the AI overhaul "led to lower quality" and began hiring again.

Detection signals:

Slow decline in satisfaction or resolution scores
Different output quality across model versions
Behavior changes after upstream provider updates

What prevents it: Continuous evaluation. Automated quality scoring on daily samples. Baseline monitoring. Model version pinning. Regression testing on each provider update.

The 15-Minute Agent Audit

Before you ship your next agent update, answer these five questions:

#	Failure Mode	Question	If "No"
1	Context Blindness	Can your agent answer "I don't know" when a question falls outside its verified knowledge?	You need context grounding + fallback responses
2	Memory Corruption	Have you tested your agent under concurrent load? Does User A's data ever appear in User B's session?	You need session isolation testing
3	Rogue Actions	Does your agent have hard limits on what it can do? (Max order size, max refund, restricted actions)	You need business logic guardrails
4	Runaway Execution	If your agent enters a retry loop right now, what stops it? Is there a cost cap? A step limit?	You need execution bounds (start with: 3 iterations, $10 ceiling, 50 steps)
5	Silent Degradation	Do you have automated quality scoring running daily on a sample of agent interactions?	You need continuous evaluation + baseline monitoring

Scored 3+ "No" answers? Your agent has significant exposure to production failures. Prioritize the highest-cost failure mode first (usually Runaway Execution or Rogue Actions).

Scored 1-2 "No" answers? You're ahead of most teams, but the gaps you have are the ones that will bite you. The question isn't if, but when.

All "Yes"? You're in the top 5% of agent deployments.

Close the gaps with Clyro ›

Just Shipped

Clyro is the Agent Kernel: runtime governance for AI agents. The Prevention Stack ships with every deployment:

Control	Default	What It Prevents
Loop detection	3 iterations	Runaway Execution ($47K loops)
Cost bounds	$10 per run	Uncapped API spending
Step limits	50 actions	Unbounded execution chains
Business logic guardrails	Your rules	Rogue Actions (260 McNuggets)

These are defaults, not suggestions. Your agent is safe out of the box.