The 5 AI Agent Failure Modes.
Why This Guide Exists
AI agents don't fail like LLMs. An LLM gives a wrong answer. One bad response. An agent takes a wrong action, then compounds it with the next action. It loops, corrupts its own state, and drifts from its objective. The blast radius grows for hours or days before anyone notices.
Every agent failure we've analyzed maps to one of five structural failure modes. This guide defines each one, shows you real incidents, gives you detection signals, and tells you what infrastructure prevents it.
The Five Failure Modes at a Glance
| # | Failure Mode | One-Line Description | Real Incident | What's Missing |
|---|---|---|---|---|
| 1 | Context Blindness | Agent acts on wrong, stale, or fabricated context | Air Canada chatbot invents bereavement policy: $812 tribunal ruling | Context validation, freshness checks, policy guardrails |
| 2 | Memory Corruption | Agent mixes data between users or sessions | ChatGPT leaks billing info between users via Redis cache bug | Session isolation, state management |
| 3 | Rogue Actions | Agent takes confidently wrong real-world actions | McDonald's AI orders 260 McNuggets on a single order | Business logic guardrails, action validation, permission boundaries |
| 4 | Runaway Execution | Agent enters loop or unbounded execution chain | Autonomous coding agent burns $47K in 11 days, zero alerts | Loop detection, cost bounds, step limits |
| 5 | Silent Degradation | Agent quality erodes gradually without triggering alerts | Klarna's AI drops 22% in satisfaction. Nobody notices for months. | Continuous evaluation, drift detection, baseline monitoring |
Mode 1: Context Blindness
What happens: The agent operates on wrong, missing, or fabricated context, and acts on it with full confidence.
Why it's dangerous: The agent doesn't know what it doesn't know. It fills gaps with plausible-sounding fabrications and takes actions based on them. This is the most legally dangerous failure mode.
Real incident: Air Canada's chatbot told a customer he could apply for a bereavement discount retroactively. The actual policy said the opposite. The customer relied on it, got denied, and won a tribunal ruling. The tribunal said: "It does not matter whether the information comes from a static page or a chatbot."
Detection signals:
- Agent confidently answers questions outside its verified knowledge
- Responses reference outdated policies, prices, or facts
- No fallback to "I don't know" or human handoff
What prevents it: Context grounding verification. Freshness checks on injected data. Policy guardrails that restrict the agent to verified information.
Mode 2: Memory Corruption
What happens: Session boundaries leak. Context from one user bleeds into another's conversation. The agent serves personalized responses to the wrong person.
Why it's dangerous: This isn't a model problem. It's an infrastructure problem: shared state where there should be strict per-session isolation. In healthcare, it's an immediate HIPAA violation. In finance, it's a regulatory incident.
Real incident: A bug in OpenAI's Redis caching library caused ChatGPT to leak chat histories and billing information (names, emails, last four credit card digits) between users.
Detection signals:
- User-specific data appears in other users' sessions
- Agent references context from previous conversations it shouldn't have
- Load testing reveals state leakage under concurrent connections
What prevents it: Strict session isolation. Per-user state management. Concurrent load testing before deployment.
Mode 3: Rogue Actions
What happens: The agent takes real-world actions with full confidence. The wrong ones.
Why it's dangerous: The system reports success. Every API call returns 200. The business outcome is wrong, and by the time someone checks, the action has already happened.
Real incident: McDonald's AI drive-thru system added 260 Chicken McNuggets to a single order. No quantity limits, no sanity checks. McDonald's ended the partnership after reaching only 85% accuracy. At ~6.5 million daily orders, that's ~975,000 wrong orders per day.
Detection signals:
- Agent actions pass validation but give absurd results
- Agent makes promises or commitments outside its authority
- Users can steer the agent through prompt injection
What prevents it: Business logic guardrails. Quantity limits, price floors, permission boundaries. Pre-execution checks against known business rules.
Mode 4: Runaway Execution
What happens: The agent enters a loop or unbounded execution chain and never stops.
Why it's dangerous: The agent isn't malfunctioning by its own measure: it's doing exactly what it was designed to do. HTTP 200. No errors. Costs compound for hours or days before anyone checks the billing dashboard.
Real incident: A schema drift edge case put four LangChain agents into a recursive retry spiral. Every message returned HTTP 200. No alerts fired. The agents ran for eleven days: $47,000 in API costs before anyone noticed.
Detection signals:
- API costs rising without corresponding output
- Near-identical actions repeating with slight variations
- Execution time far exceeding baseline
What prevents it: Loop detection (3 iterations). Cost bounds ($10/execution). Step limits (100 steps). Duration limits. Circuit breakers.
Mode 5: Silent Degradation
What happens: Agent quality declines gradually. No errors, no alerts, no clear incident. Just worse outcomes, slowly.
Why it's dangerous: Because nothing breaks. Teams blame seasonality, "AI being AI," or user behavior. By the time the drop is undeniable, months of damage have already compounded.
Real incident: Klarna's AI assistant handled the work of 700 full-time agents. Then satisfaction dropped sharply. Complex issues looped without resolution. By early 2025, the CEO publicly admitted the AI overhaul "led to lower quality" and began hiring again.
Detection signals:
- Slow decline in satisfaction or resolution scores
- Different output quality across model versions
- Behavior changes after upstream provider updates
What prevents it: Continuous evaluation. Automated quality scoring on daily samples. Baseline monitoring. Model version pinning. Regression testing on each provider update.
The 15-Minute Agent Audit
Before you ship your next agent update, answer these five questions:
| # | Failure Mode | Question | If "No" |
|---|---|---|---|
| 1 | Context Blindness | Can your agent answer "I don't know" when a question falls outside its verified knowledge? | You need context grounding + fallback responses |
| 2 | Memory Corruption | Have you tested your agent under concurrent load? Does User A's data ever appear in User B's session? | You need session isolation testing |
| 3 | Rogue Actions | Does your agent have hard limits on what it can do? (Max order size, max refund, restricted actions) | You need business logic guardrails |
| 4 | Runaway Execution | If your agent enters a retry loop right now, what stops it? Is there a cost cap? A step limit? | You need execution bounds (start with: 3 iterations, $10 ceiling, 50 steps) |
| 5 | Silent Degradation | Do you have automated quality scoring running daily on a sample of agent interactions? | You need continuous evaluation + baseline monitoring |
Scored 3+ "No" answers? Your agent has significant exposure to production failures. Prioritize the highest-cost failure mode first (usually Runaway Execution or Rogue Actions).
Scored 1-2 "No" answers? You're ahead of most teams, but the gaps you have are the ones that will bite you. The question isn't if, but when.
All "Yes"? You're in the top 5% of agent deployments.
Just Shipped
Clyro is the Agent Kernel: runtime governance for AI agents. The Prevention Stack ships with every deployment:
| Control | Default | What It Prevents |
|---|---|---|
| Loop detection | 3 iterations | Runaway Execution ($47K loops) |
| Cost bounds | $10 per run | Uncapped API spending |
| Step limits | 50 actions | Unbounded execution chains |
| Business logic guardrails | Your rules | Rogue Actions (260 McNuggets) |
These are defaults, not suggestions. Your agent is safe out of the box.