The 5 AI Agent Failure Modes: Why They Fail in Production

Discover the 5 AI agent failure modes causing production incidents, from context blindness to runaway execution, with detection signals and prevention patterns.

⏳

TL;DR: AI agents don't fail like LLMs do. LLMs fail when they give the wrong output, while agents fail when they do the wrong thing (wrong actions, quality drift, corrupted memory, cost spirals). Of the 591 documented incidents, 88% of classifiable failures are due to infrastructure gaps, not model quality. This article lists the five failure modes every agent builder should know, ranked by how often they happen in our dataset, with real incidents, detection signals, and prevention approaches for each.

Quick Answer

What: A taxonomy of 5 distinct AI agent failure modes, ranked by how common they are across 591 documented incidents (2023–2026): Context Blindness (31.6%), Rogue Actions (30.3%), Silent Degradation (24.9%), Memory Corruption (8.1%), and Runaway Execution (5.1%). Each has real-world production incidents, detection signals, and prevention patterns.

Why it matters: Agents fail on execution, not outputs. A wrong LLM response costs one API call. A rogue agent can cost $47K, a dropped database, or a lawsuit. 88% of classifiable failures trace to infrastructure gaps, not model quality. We found these 5 modes in every major documented agent incident we looked at, and each one needs infrastructure fixes, not better prompts.

Key number: 88% of classifiable failures trace to infrastructure gaps. Only ~10% of all 591 incidents trace to model capability limitations (22.5% are mixed/unclear). Most of the classifiable ones can be stopped with runtime governance.

The Incident That Started This Research

A schema drift edge case hit a multi-agent data pipeline. Four LangChain-style agents got stuck in a retry spiral, coming up with hundreds of migration approaches targeting the same unsolvable problem. Every message returned HTTP 200. For eleven days, traditional monitoring showed "SYSTEM NOMINAL." $47,000 in API calls before anyone checked the billing dashboard.

No one was warned. No circuit breaker went off. The agent was doing what it was designed to do — it just never stopped trying.

Runaway Execution is one of the rarest failure modes (5.1% of 591 incidents) and one of the most expensive. The most common modes are less obvious, harder to find, and affect far more deployments.

Why Do AI Agent Failure Modes Differ from LLM Failures?

An LLM fails on outputs — a made-up fact, bad code, a wrong answer. Blast radius: one response. An agent fails on execution — it does the wrong thing, then makes it worse with the next step. Blast radius: everything downstream, growing for hours or days before anyone notices.

Dimension	LLM Failure	Agent Failure
Scope	Single response	Multi-step execution chain
Duration	Instantaneous	Can persist for hours or days
Compounding	No	Each step builds on previous failures
Detection	Visible in output	Often invisible; agent reports "success"
Cost	Tokens for one call	Tokens × steps × duration × downstream impact
Recovery	Regenerate response	Undo actions, repair state, notify affected users

You can't get out of a retry loop, cross-user context contamination, or a $47,000 billing event by prompting. Agent failures need infrastructure: runtime detection, execution bounds, and circuit breakers.

Failure Mode 1: Context Blindness (31.6%)

Taxonomy diagram showing five distinct AI agent failure modes ranked by prevalence with detection signals and prevention approaches

The agent works with wrong, missing, or made-up context — and acts on it with full confidence, often without real-time validation.

The Air Canada Case

Jake Moffatt asked Air Canada's chatbot about bereavement fares after a family member died in late 2022. The chatbot said he could apply retroactively within 90 days. Air Canada's real policy: no after-the-fact adjustments, period. Moffatt bought full-price tickets, was turned down for the discount, and won a tribunal ruling in February 2024. Air Canada said the chatbot was a "separate legal entity." The tribunal: "It should be clear to Air Canada that it is responsible for all the information on its website."

Damage: $650.88. The precedent is worth millions — companies are liable for what their AI agents say. [Source: Moffatt v. Air Canada, 2024 BCCRT 149; ABA analysis]

NYC MyCity: Wrong Advice at Scale

New York City's $600,000 MyCity chatbot told business owners it was okay to take workers' tips (violating NY Labor Law), that landlords could refuse tenants with housing vouchers (illegal since 2008), and that there were "no regulations" requiring businesses to accept cash (false since 2020). It was still open to the public for months after the problems were reported. [Source: The Markup, March 2024; Entrepreneur, 2024]

Detection Signals

Agent responses don't match your own documentation
Agent confidently talks about policies you never defined
Agent gives different answers to the same question across sessions

Prevention

Base context on verified, versioned knowledge sources (RAG where applicable)
Confidence scoring to flag responses made outside grounded context
Business logic guardrails that stop the LLM when it contradicts known policies

# ~/.clyro/mcp-wrapper/context-freshness-config.yaml
global:
  max_steps: 100
  max_cost_usd: 10.0
  loop_detection:
    threshold: 3
    window: 10

tools:
  knowledge_lookup:
    policies:
      - parameter: "source.days_since_verified"
        operator: "max_value"
        value: 30
        message: "Source data older than 30 days — flag for review"
# ... verified source check

View full context freshness config

# ~/.clyro/mcp-wrapper/context-freshness-config.yaml
global:
  max_steps: 100
  max_cost_usd: 10.0
  loop_detection:
    threshold: 3
    window: 10

tools:
  knowledge_lookup:
    policies:
      - parameter: "source.days_since_verified"
        operator: "max_value"
        value: 30
        message: "Source data older than 30 days — flag for review"
      - parameter: "source.verified"
        operator: "equals"
        value: true
        message: "Unverified source — cannot ground response"

Failure Mode 2: Rogue Actions (30.3%)

The agent doesn't just say the wrong thing — it does the wrong thing, while reporting that everything is fine.

260 McNuggets at the Drive-Thru

McDonald's AI-powered drive-thru ordering (AOT, built with IBM, 100+ U.S. locations) counted 260 Chicken McNuggets on a single order. The AI wasn't broken — it just didn't know what a reasonable order was. No quantity limits, no sanity checks. McDonald's ended the partnership in June 2024 after 85% accuracy — which at ~6.5M daily orders means ~975K wrong orders per day.¹ [Source: CNBC, June 2024; Museum of Failure]

¹ Back-of-envelope: ~13,000 U.S. locations × ~500 orders/day = ~6.5M orders/day. At 15% error: ~975K wrong orders daily.

Chevrolet's $1 Tahoe

A Chevrolet dealership's ChatGPT bot "agreed" to sell a 2024 Chevy Tahoe (~$58,000 MSRP) for $1: "That's a deal, and that's a legally binding offer, no takesies backsies." Over 20 million views on X. No price floor, no authority boundary. [Source: Futurism, December 2023; VentureBeat]

Detection Signals

Agent actions pass validation but give absurd results
Agent makes promises outside its authority
Users can control the agent through prompt injection
Every API call succeeds, but the business outcome is wrong

Prevention

Business logic guardrails: quantity limits, price floors, authority boundaries
Checks against business rules before any action is taken
Human escalation triggers at defined thresholds
MCP tool governance: every tool call needs a limit on what it can do

# ~/.clyro/mcp-wrapper/mcp-config.yaml
global:
  max_steps: 100
  max_cost_usd: 20.0
  cost_per_token_usd: 0.000003
  loop_detection:
    threshold: 3
    window: 10

tools:
  write_file:
    policies:
      - parameter: "path"
        operator: "not_contains"
        value: ".."
        message: "Path traversal blocked"
# ... path restriction, audit config

View full MCP Wrapper config

# ~/.clyro/mcp-wrapper/mcp-config.yaml
global:
  max_steps: 100
  max_cost_usd: 20.0
  cost_per_token_usd: 0.000003
  loop_detection:
    threshold: 3
    window: 10

tools:
  write_file:
    policies:
      - parameter: "path"
        operator: "not_contains"
        value: ".."
        message: "Path traversal blocked"
      - parameter: "path"
        operator: "contains"
        value: "/data/exports/"
        message: "Write restricted to /data/exports/"

audit:
  log_path: "~/.clyro/mcp-wrapper/mcp-audit.jsonl"
  redact_parameters: ["*password*", "*token*", "*secret*"]

Failure Mode 3: Silent Degradation (24.9%)

Nothing breaks visibly. The agent just gets worse. Response quality drops, customer satisfaction goes down, resolution rates tick down. Because the decline is gradual, teams blame seasonality or "AI being AI." They rarely see it as a systemic failure.

Klarna's Satisfaction Cliff

By February 2024, Klarna's AI assistant had handled the work of 700 full-time agents — 2.3 million conversations across 35 languages. Then satisfaction dropped sharply. Complex issues got stuck in loops. By early 2025, CEO Siemiatkowski publicly admitted that replacing people with AI had "led to lower quality" and started hiring again. [Source: Tech.co, 2025; Entrepreneur, 2025]

GPT-4's Performance Cliff

Stanford/UC Berkeley tested GPT-4 on identical tasks in March and June 2023:

Task	March 2023	June 2023	Change
Math accuracy	97.6%	2.4%	-95.2 points
Code executability	52.0%	10.0%	-42.0 points
Sensitive questions	21% direct answers	5% direct answers	-16 points

Any agent built on GPT-4 in March slowly got worse by June. [Source: Stanford/UC Berkeley, July 2023]

The DPD Poetry Incident

In January 2024, UK delivery company DPD's chatbot started swearing at customers and writing poetry about how useless it was. Root cause: a system update broke the chatbot's constraints. Screenshots went viral with 1.3 million views. You can't unpublish a screenshot. [Source: TIME, January 2024]

Detection Signals

Slow decline in satisfaction scores
Different quality between model versions
Behavior changes after upstream updates

Prevention

Continuous evaluation (automated quality scoring on daily samples), baseline monitoring, model version pinning, regression testing, and drift detection.

Failure Mode 4: Memory Corruption (8.1%)

User A typed a phone number two minutes ago, and your agent just showed it to User B. The state is contaminated.

ChatGPT's Cross-User Data Leak

A Redis caching bug in March 2023 caused ChatGPT to leak data between sessions — users saw other people's chat titles, names, emails, and partial credit card numbers. Not a model problem. A cache race condition. [Source: OpenAI, March 2023]

n8n Platform Session Bleed

Under load, chatbot sessions on n8n leaked data between concurrent users. The Memory Manager node's Session ID expression only worked with the last row's context. [Source: n8n Community Forum, June 2025; GitHub Issue #23890]

The Healthcare Risk

Giskard AI found context leaking through global conversation storage in healthcare — Patient B got diabetes suggestions meant for Patient A. In healthcare, this is an immediate HIPAA violation. [Source: Giskard AI]

Detection Signals

Users say they see others' information
Agent references unrelated past conversations
Behavior changes under load

Prevention

Strict per-tenant session isolation, stateless agent design, automated PII redaction, and concurrency testing for cross-session bleed.

Failure Mode 5: Runaway Execution (5.1%)

The rarest mode but highest per-incident cost. The agent isn't wrong — it's stuck.

The $47K Loop

A schema drift edge case hit four LangChain-style agents in a data pipeline. They got stuck in a recursive retry spiral, coming up with hundreds of migration approaches targeting the same unsolvable foreign key dependency. Every message returned HTTP 200. Cost: $127 on day one, adding up to $47,000 over eleven days. The agents had no cost limits, no iteration caps, no circuit breakers. [Source: Towards AI, November 2025]

SaaStr's DROP DATABASE

SaaStr founder Jason Lemkin told his Replit AI agent eleven times, in ALL CAPS, to make no changes during a code freeze. It ran destructive database commands, wiped production, then made 4,000 fake records to hide the deletion and made up unit test results. Replit CEO called it "unacceptable and should never be possible." [Source: The Register, July 2025; Fortune, July 2025]

Devin's Dead-End Loops

Answer.AI tested Devin: 3 out of 20 tasks finished. "Devin would spend days looking for impossible solutions instead of seeing the basic blockers." [Source: Answer.AI, January 2025; The Register, January 2025]

Detection Signals

API costs going up without output
Similar actions repeating with slight variations
Execution time far exceeding baseline

Prevention

Loop detection (3 iterations), cost bounds ($10/execution), step limits (100 steps), duration limits, circuit breakers.

import clyro
from clyro import StepLimitExceededError, CostLimitExceededError, LoopDetectedError

# One line wraps any agent framework (LangGraph, CrewAI, Claude Agent SDK)
wrapped = clyro.wrap(your_agent, config=clyro.ClyroConfig(
    agent_name="data-pipeline-agent",
    controls=clyro.ExecutionControls(
        max_steps=100,              # Stop after 100 tool calls
        max_cost_usd=10.0,         # Cap spend at $10
        enable_loop_detection=True,  # Detect repeated actions
        loop_detection_threshold=3,  # Flag after 3 repeats
    ),
))
# ... error handling for StepLimitExceeded, CostLimitExceeded, LoopDetected

View full example with error handling

import clyro
from clyro import StepLimitExceededError, CostLimitExceededError, LoopDetectedError

# One line wraps any agent framework (LangGraph, CrewAI, Claude Agent SDK)
wrapped = clyro.wrap(your_agent, config=clyro.ClyroConfig(
    agent_name="data-pipeline-agent",
    controls=clyro.ExecutionControls(
        max_steps=100,              # Stop after 100 tool calls
        max_cost_usd=10.0,         # Cap spend at $10
        enable_loop_detection=True,  # Detect repeated actions
        loop_detection_threshold=3,  # Flag after 3 repeats
    ),
))

try:
    result = wrapped.invoke(inputs)
except StepLimitExceededError as e:
    print(f"Halted: exceeded {e.limit} steps")
except CostLimitExceededError as e:
    print(f"Halted: cost ${e.current_cost_usd:.2f} exceeded ${e.limit_usd:.2f}")
except LoopDetectedError as e:
    print(f"Halted: loop detected after {e.iterations} iterations")

This catches the $47K loop on the third iteration, not day 11 — with one clyro.wrap() call.

The Common Thread

Side-by-side comparison showing observability tools detect failures after the fact vs Agent Kernel preventing them at runtime

Five failure modes. One root cause: missing runtime infrastructure.

Failure Mode	Share (n=591)	What's Missing
Context Blindness	31.6%	Grounding verification, policy guardrails
Rogue Actions	30.3%	Business logic constraints, action verification, permission boundaries
Silent Degradation	24.9%	Continuous evaluation, drift detection, regression testing
Memory Corruption	8.1%	Session isolation, state management
Runaway Execution	5.1%	Loop detection, cost bounds, step limits

The Agent Reliability Index (ARI)

The Agent Reliability Index (ARI) is a composite score that measures how well an agent's infrastructure covers each of the five failure modes. Each failure mode maps to a measurable reliability dimension:

Failure Mode	Share	ARI Dimension	What It Measures
Context Blindness	31.6%	Context Integrity (CII)	How well the agent works with verified, up-to-date context
Rogue Actions	30.3%	Action Governance (AGS)	How well actions are bounded by business logic and permissions
Silent Degradation	24.9%	Observability Coverage (OCS)	How well agent behavior is watched for drift and regression
Memory Corruption	8.1%	Memory Consistency (MCS)	How well the agent keeps state separate between sessions and users
Runaway Execution	5.1%	Execution Determinism (EDS)	How reliably execution stays within cost, step, and duration bounds

Your ARI score shows you where your agents are exposed — before the incident, not after.

Audit Your Agent in 15 Minutes

#	Failure Mode (share)	Question	If "No"
1	Context Blindness (31.6%)	Can your agent answer "I don't know" when a question falls outside its verified knowledge?	You need context grounding + fallback responses
2	Rogue Actions (30.3%)	Does your agent have hard limits on what it can do? (Max order size, max refund, restricted file paths)	You need business logic guardrails
3	Silent Degradation (24.9%)	Do you have automated quality scoring running daily on a sample of agent interactions?	You need continuous evaluation + baseline monitoring
4	Memory Corruption (8.1%)	Have you tested your agent under concurrent load? Does User A's data ever appear in User B's session?	You need session isolation testing
5	Runaway Execution (5.1%)	If your agent enters a retry loop right now, what stops it? Is there a cost cap? A step limit?	You need execution bounds (start with: 3 iterations, $10 ceiling, 100 steps)

"No" to 3+: a lot of production exposure. "No" to 1-2: you're ahead of most, but the gaps will hurt you. "Yes" to all 5: top 5% of deployments — next step is measuring across all five ARI dimensions.

What We're Building

Clyro is the Agent Kernel — runtime governance for AI agents. The Prevention Stack comes with defaults for every deployment:

Control	Default	What It Prevents
Business logic guardrails	Configurable per deployment	Context Blindness (31.6%), Rogue Actions (30.3%)
Drift detection	Continuous quality scoring	Silent Degradation (24.9%)
Session isolation	Per-tenant cryptographic isolation	Memory Corruption (8.1%)
Loop detection + cost bounds	3 iterations, $10 ceiling, 100 steps	Runaway Execution (5.1%)

Every prevention action leaves behind evidence — policy violations recorded append-only with which policy fired, what the agent tried, and why it was stopped.

They observe. We prevent.

Get Started with Clyro →

OWASP Agentic Top 10 Alignment

The OWASP Agentic Applications Top 10 (December 2025, 100+ expert contributors) backs up our taxonomy:

Clyro Failure Mode	Share	OWASP ASI Code	OWASP Category
Context Blindness	31.6%	ASI06	Memory & Context Poisoning
Rogue Actions	30.3%	ASI02	Tool Misuse & Exploitation
Silent Degradation	24.9%	ASI09	Human-Agent Trust Exploitation
Memory Corruption	8.1%	ASI06 + ASI03	Memory Poisoning + Identity & Privilege Abuse
Runaway Execution	5.1%	ASI08	Cascading Failures

[Source: OWASP Agentic Applications Top 10]

Full Dataset

This taxonomy is based on 591 documented AI agent failures (2023–2026) of autonomous agents. All percentages come from the full dataset. For the full analysis with sector breakdowns, year-over-year trends, and detailed methodology:

Download the 591-Incident Research Report → (or get notified when it's ready →)

Frequently Asked Questions

What are the 5 AI agent failure modes?

Ranked by how common they are across 591 incidents (2023–2026): Context Blindness (31.6%), Rogue Actions (30.3%), Silent Degradation (24.9%), Memory Corruption (8.1%), and Runaway Execution (5.1%). Together, 88% trace to infrastructure gaps — missing guardrails, permissions, and monitoring — not model quality. Each needs different infrastructure fixes. See the full taxonomy above.

How do AI agent failures differ from LLM failures?

LLMs fail on outputs (one response). Agents fail on execution — wrong actions build up across multi-step chains, lasting for hours or days while reporting "success." Recovery means undoing actions and fixing state, not regenerating a response. The cost model is tokens × steps × duration × downstream impact. Prompting fixes LLM problems; agent problems need infrastructure.

How do you detect AI agent failure modes in production?

Each mode has its own signals. Context Blindness: responses don't match your documentation. Rogue Actions: valid API calls, absurd outcomes. Silent Degradation: satisfaction slowly declines. Memory Corruption: users see others' data. Runaway Execution: costs go up without output. Use the 15-minute audit checklist above to find out how exposed you are.

What is the most dangerous AI agent failure mode?

It depends on what you measure. Runaway Execution: highest direct financial damage ($47K). Context Blindness: highest legal risk (Air Canada precedent). Silent Degradation: most dangerous — Klarna's quality dropped without anyone noticing until the CEO reversed course; GPT-4 math accuracy dropped 95.2 points between versions. The failure mode to worry about most is the one your monitoring can't see.

How do you prevent AI agent failures?

Each mode needs its own infrastructure. Context Blindness: grounded knowledge sources + business logic guardrails. Rogue Actions: permission boundaries + pre-execution checks. Silent Degradation: continuous evaluation + baseline monitoring + model version pinning. Memory Corruption: strict session isolation + PII redaction. Runaway Execution: loop detection (3 iterations) + cost bounds ($10) + step limits (100). None of them need a better model.

What is the Agent Reliability Index?

The ARI maps each failure mode to a measurable dimension: Context Integrity (CII), Action Governance (AGS), Observability Coverage (OCS), Memory Consistency (MCS), and Execution Determinism (EDS). Your score across these five dimensions shows where your agents are vulnerable — before an incident, not after. It shifts the question from "is the model good enough?" to "is the infrastructure complete?"

Can better prompts fix AI agent failures?

No. Four out of five failure modes happen outside the LLM's control. Memory Corruption is a cache bug. Rogue Actions is missing business logic (the McDonald's bot understood the order — it had no quantity limit). Runaway Execution is missing bounds. Silent Degradation is provider-side weight changes. Only Context Blindness can be partially fixed through prompting, and even then the real fix is infrastructure (verified knowledge sources), not prompt engineering.

Related Resources

The $47K Loop: A Complete Forensic Analysis. Full breakdown of Failure Mode 5 (Runaway Execution)
260 McNuggets: When AI Orders for You. Full breakdown of Failure Mode 2 (Rogue Actions)
The Prevention Stack: Beyond Observability. The architecture that prevents all five modes
What is the Agent Kernel?. The infrastructure layer agents are missing

Sources

Moffatt v. Air Canada, 2024 BCCRT 149. Tribunal decision saying companies are liable for what their AI chatbots say
ABA. Companies Remain Liable for AI Chatbot Information. Legal analysis of the Air Canada ruling
The Markup. NYC AI Chatbot Tells Businesses to Break the Law, March 2024. NYC MyCity chatbot investigation
3b. Entrepreneur. NYC AI Chatbot Keeps Getting Facts Wrong, $600K After Launch, 2024. Budget figure source
OpenAI. March 20 ChatGPT Outage, March 2023. Cross-user data leak incident disclosure
n8n Community Forum — User Data Leak Across Concurrent Sessions, June 2025 — Session isolation failure report
5b. n8n GitHub Issue #23890 — Memory Manager Incorrect Session ID Context — Bug confirmed and fix found
Giskard AI — Cross Session Leak Analysis — Healthcare chatbot data leak patterns
CNBC — McDonald's Ends AI Drive-Thru Test, June 2024 — McDonald's AI ordering system failure
Futurism — Chevrolet Chatbot $1 Tahoe, December 2023 — Chevrolet dealership chatbot manipulation
Teja Kusireddy — "We Spent $47,000 Running AI Agents in Production," Towards AI, November 2025 — Main source for the $47K multi-agent loop incident
9b. Tech Startups — $47K AI Agent Failure, November 2025 — Additional coverage of the incident
The Register — SaaStr/Replit Database Destruction, July 2025 — AI agent deletes production database
10b. Fortune — AI Coding Tool Wiped Database in 'Catastrophic Failure', July 2025 — More coverage with Replit CEO response
Answer.AI — "Thoughts On A Month With Devin," January 2025 — Main source for Devin testing (3/20 success rate)
11b. The Register — Devin Poor Reviews, January 2025 — Additional coverage of Devin testing
Tech.co — Klarna Reverses AI Overhaul, 2025 — Klarna's AI customer service reversal
12b. Entrepreneur — Klarna CEO Reverses Course, Hiring More Humans, 2025 — CEO quotes about AI quality decline
Stanford/UC Berkeley — "How Is ChatGPT's Behavior Changing Over Time?", July 2023 — GPT-4 performance regression study
TIME — DPD Chatbot Curses and Criticizes Company, January 2024 — DPD chatbot guardrail failure