Why Air Canada's AI Chatbot Lost a Lawsuit: The Cost of Context Drift

Q: What's the difference between AI hallucination and context drift?

Hallucination is fabrication — the LLM invents a fact with no basis. Context drift is more insidious: the chatbot's context (knowledge base, retrieved documents, embedded policy) is stale or inconsistent with the authoritative source, and the answer looks plausible. Air Canada's chatbot didn't invent a bereavement policy; it described a drifted version of one.

Q: What are the 4 mechanisms of context intelligence?

Four mechanisms: (1) Source Binding — every response ties to a versioned document with hash + last-verified timestamp; (2) Freshness Checks — automated reconciliation between knowledge base and canonical sources, hourly for high-stakes domains; (3) Context Validation — runtime check against current source before serving; (4) Contradiction Detection — cross-reference response against any linked documents.

How the Air Canada AI chatbot lost a lawsuit over fabricated bereavement policy - and What It Means for Every Enterprise. A forensic analysis of context drift and how to prevent it

A forensic analysis of the Air Canada chatbot case and the governance controls that could have prevented it.

⏳

TL;DR: In 2024, a Canadian tribunal ruled that the Air Canada AI chatbot was liable for giving a grieving passenger wrong bereavement fare information. The airline tried to argue the chatbot was "a separate legal entity." It lost. The root cause wasn't hallucination, it was context drift: the chatbot served policy information that contradicted the airline's own webpage. This is the failure mode every enterprise deploying customer-facing AI needs to understand.

The Moffatt Case

On November 11, 2022, Jake Moffatt's grandmother died in Ontario. That same day, from Vancouver, Moffatt asked Air Canada's website chatbot about bereavement fares. It told him he could apply retroactively:

"If you need to travel immediately or have already travelled and would like to submit your ticket for a reduced bereavement rate, kindly do so within 90 days of the date your ticket was issued by completing our Ticket Refund Application form."

[Source: Chatbot screenshot submitted as evidence in Moffatt v. Air Canada, 2024 BCCRT 149]

A phone agent quoted him $380/flight. Consistent with the chatbot. Moffatt booked: Toronto Nov 12, return Nov 16, CA$1,640.36 total. On Nov 17, within the 90-day window, he submitted his refund application with the death certificate.

Air Canada denied the claim. The actual "Bereavement travel" policy page, linked from the chatbot's own answer, said bereavement fares "does not apply to requests for bereavement consideration after travel has been completed." The chatbot had told him the opposite of the airline's own policy.

Three months of emails followed. On February 5, 2023, Moffatt sent the chatbot screenshot and death certificate. An Air Canada rep acknowledged the chatbot had provided "misleading words," but offered no refund. Moffatt filed with the British Columbia Civil Resolution Tribunal.

The Tribunal Ruling

On February 14, 2024, Tribunal Member Christopher C. Rivers ruled for Moffatt on every count.

Air Canada's primary defense: it "cannot be held liable for information provided by one of its agents, servants, or representatives, including a chatbot." The airline argued the chatbot was a separate legal entity responsible for its own actions.

Rivers's response:

"This is a remarkable submission. While a chatbot has an interactive component, it is still just a part of Air Canada's website. It should be obvious to Air Canada that it is responsible for all the information on its website. It makes no difference whether the information comes from a static page or a chatbot."

Air Canada's fallback, that Moffatt should have cross-checked the chatbot against the "Bereavement travel" page, also failed:

"There is no reason why Mr. Moffatt should know that one section of Air Canada's webpage is accurate, and another is not."

The tribunal found negligent misrepresentation: Air Canada owed Moffatt a duty of care as a service provider, and "did not take reasonable care to ensure its chatbot was accurate." Moffatt's reliance on the chatbot was "reasonable in the circumstances," and he would not have flown last-minute had he known the full fare applied.

Damages: CA$650.88 (fare difference) + CA$36.14 (pre-judgment interest) + CA$125 (tribunal fees) = CA$812.02. [Source: Moffatt v. Air Canada, 2024 BCCRT 149; ABA analysis]

CA$812 is a rounding error. The precedent isn't. Every company deploying customer-facing AI is now on notice: you own what your chatbot says, even when it contradicts your own policies.

The Air Canada Chatbot Incident: Timeline

What Did the Air Canada AI Chatbot Actually Do?

This wasn't hallucination. The chatbot didn't invent a bereavement policy from nothing. It described a plausible version that contradicted the actual policy on the same website.

This is context drift: the agent operates on context that is stale, incomplete, or inconsistent with the authoritative source, and presents it to the user with full confidence.

Inference disclosure: Air Canada has not publicly disclosed the architecture of its chatbot. Analysis below draws on common 2022 chatbot architectures, tribunal-documented behavior, and publicly available legal/technical commentary.

The Likely Mechanism

2022 customer-service chatbots typically used intent-matching (rule-based retrieval from a curated knowledge base) or RAG (LLM synthesis over retrieved documents). Either way, the chatbot's answer is only as accurate as the corpus it reads from.

Context drift occurs three ways:

Stale knowledge base. Policy updates on the website don't propagate to the chatbot's source, so the chatbot serves yesterday's rules.
Wrong document retrieved. The retrieval layer pulls an adjacent document (an older version, a related policy, an internal draft) and synthesizes from that.
Synthesis misrepresentation. Even with the right document, the LLM subtly distorts it, so "you must apply before travel" becomes "you can apply within 90 days after travel."

Industry literature labels overlapping variants of these modes context engineering failures, specification drift, and agentic drift. Different names, shared root cause: the agent's context diverges from ground truth and nothing validates it before the user sees the answer.

The Air Canada case most plausibly combines the first two: stale or incomplete policy data, confidently served. [Source: Hackaday analysis; Verdantix analysis]

Context Drift : When the chatbot Reads the wrong Policy

Why Context Drift Is Worse Than Hallucination

Hallucination is fabrication: the LLM invents something with no basis. Context drift is more insidious. The information looks correct because it is plausible, came from an authoritative-seeming source, and may even have been correct at some point.

A hallucinated answer feels wrong. A context-drifted answer feels right, to users and to any quality check measuring coherence rather than ground truth. The Air Canada chatbot's answer was coherent, helpful, specific, and wrong.

Why Couldn't Observability Have Caught the Air Canada AI Chatbot Error?

Imagine Air Canada had full observability: request/response logs, latency, satisfaction scores. What would they show?

Request: user asks about bereavement fare policy.
Response: detailed, confident answer about retroactive applications within 90 days.
Latency: normal.
User behavior: books flights (positive engagement signal).
Satisfaction: no complaint filed until months later.

Every metric green. The response well-formed. The user engaged. No error thrown. The chatbot did exactly what it was designed to do.

Observability tools monitor system behavior, not context accuracy. They show you the chatbot responded; they don't show you the knowledge source is out of sync with the canonical policy page. Observability answers "did the system respond?" It doesn't answer "is the knowledge base current? Does the context match the authoritative source?" Those questions require different infrastructure.

Logs would have shown Moffatt got a response. They would not have flagged that the response contradicted Air Canada's own policy. The failure wasn't in generation. It was in the context generation ran on.

Prevention: Context Intelligence

Not monitoring, runtime governance. A system that validates the chatbot's knowledge against authoritative sources before the response reaches the user.

Preventing Context Drift: Three Controls

1. Source Binding

Every response should trace to a specific, versioned source document, not "the knowledge base" in general, but a specific policy page + version + timestamp. If the source says "does not apply after travel" and the chatbot is about to say the opposite, that's a detectable conflict.

Implementation: every response carries metadata identifying source document(s), version hashes, and last-verified timestamps. Responses from sources older than a threshold (e.g., 24 hours) get flagged or blocked.

2. Freshness Checks

Is the knowledge base in sync with the live website? Have policy pages been updated since the corpus was last refreshed?

Implementation: automated reconciliation between chatbot KB and canonical sources. Daily minimum; hourly for high-risk domains (pricing, legal). On diff, block answers on that topic or surface a disclaimer.

3. Context Validation

Before serving, validate the response against the current authoritative source. Runtime check that catches drift in the moment.

Implementation: for high-stakes categories (refunds, legal, pricing), route the response through a validation layer comparing its claims against the current source. If it contradicts, block and route to a human.

4. Contradiction Detection

The Air Canada chatbot linked to the "Bereavement travel" page that directly contradicted its own answer. Contradiction detection catches exactly this.

Implementation: cross-reference the response against any documents it links. If response asserts X and the linked page asserts not-X, halt.

None of this is speculative. Source binding, freshness checks, and contradiction detection are engineering problems with known solutions. They just weren't in place.

Broader Pattern

Air Canada isn't isolated. Context drift appears across every industry deploying customer-facing AI.

NYC MyCity: Wrong Advice at Municipal Scale

New York City launched the MyCity chatbot in October 2023 to help small business owners navigate regulations. Foundational build reportedly cost ~$600,000 on Microsoft Azure.

A March 2024 Markup investigation found the chatbot systematically giving advice that would expose business owners to legal liability:

Workers' tips: told owners they could take a cut of workers' tips, which violates NY Labor Law §196-d.
Housing vouchers: told landlords they could refuse Section 8 tenants, illegal in NYC since 2008.
Cash payments: told stores they could go cashless; NYC has required cash acceptance since 2020.
Minimum wage: quoted $15/hr. It was $16/hr.

Asked if it could be relied on for professional advice, the chatbot answered "Yes," directly contradicting the disclaimer on the same page. It stayed publicly accessible for months after the problems were documented. Mayor Zohran Mamdani shut it down in January 2026, calling it "functionally unusable." [Source: The Markup, March 2024; April 2024; January 2026]

Same failure mode as Air Canada: the chatbot's context, city regulations, was stale or incomplete. Confident answers that contradicted actual law. No one verified the context pre-deploy or monitored it after.

DPD: When a System Update Removes the Guardrails

In January 2024, UK delivery company DPD's chatbot started swearing at customers, writing self-deprecating poetry, and calling DPD "the worst delivery firm in the world." Customer Ashley Beauchamp posted screenshots to X. They hit 1.3M views.

DPD confirmed the cause: a system update the previous day had broken the chatbot's behavioral constraints. Before the update, it worked normally; after, the profanity and off-topic guardrails were gone.

A different flavor of context drift: not stale policy data, but stale behavioral constraints. A system update invalidated the rules. Nobody validated the guardrails were still intact post-deploy. [Source: TIME, January 2024; ITV News]

The Common Thread

Three incidents, three industries, three different architectures. Same root cause: the agent operated on context that didn't match reality, and nothing validated that context before it reached the user.

Incident	Context That Drifted	Consequence
Air Canada	Bereavement fare policy	CA$812 judgment + legal precedent
NYC MyCity	City regulations on tips, housing, wages	Business owners advised to break the law
DPD	Behavioral constraints on language and tone	Viral brand damage, 1.3M views

Resolution

The Air Canada ruling established what should have been obvious: you are liable for what your AI says. Not your chatbot vendor, not the LLM provider, not the "separate legal entity" running on your website. You.

It's negligent misrepresentation applied to a new medium, and the "chatbot is just a tool, we disclaim liability" defense wasn't even attempted by Air Canada (the tribunal specifically noted this).

The technical failure beneath this liability is context drift. Knowledge diverged from source of truth; nobody detected it; a customer was harmed. The fix isn't observability. Observability shows drift after it reaches the user. The fix is runtime governance: source binding, freshness checks, context validation, contradiction detection. Infrastructure between agent and user that enforces context integrity in real time. That's the Context Integrity dimension of the Prevention Stack: the difference between knowing your chatbot responded and knowing it responded correctly.

Your Context Integrity Checklist

Before deploying customer-facing AI, verify:

Every response is bound to a specific, versioned source document
Freshness checks reconcile the knowledge base against canonical sources on a defined schedule
High-stakes categories (pricing, legal, refunds) pass runtime validation against current source
Contradiction detection cross-references responses against any documents linked in them

Every enterprise with customer-facing AI has the same exposure Air Canada had. Build the infrastructure, or wait for the tribunal.

Get Started

Install the SDK and add runtime governance to your agents in under a minute.

pip install clyro

Free tier: 10 agents, 100K traces/month, no credit card required.

Works with LangGraph, CrewAI, Claude Agent SDK, Anthropic SDK, and any Python callable.

FAQ

Did Air Canada actually lose a lawsuit over its chatbot?

Yes. On February 14, 2024, the British Columbia Civil Resolution Tribunal ruled against Air Canada in Moffatt v. Air Canada, 2024 BCCRT 149. The tribunal found the airline liable for negligent misrepresentation: its chatbot told Jake Moffatt he could apply retroactively for a bereavement fare, contradicting the policy on Air Canada's own website. Damages: CA$812.02.

What's the difference between AI hallucination and context drift?

Hallucination is fabrication, the LLM invents a fact with no basis. Context drift is more insidious: the chatbot's context (knowledge base, retrieved documents, embedded policy) is stale or inconsistent with the authoritative source, and the answer looks plausible. Air Canada's chatbot didn't invent a bereavement policy; it described a drifted version of one.

Couldn't better observability or monitoring have caught the Air Canada error?

No. Observability would have shown the response completed normally, the user engaged (booked flights), metrics green. Observability answers "did the system respond?", not "is the context current or consistent with the source of truth?" Those are different questions requiring runtime governance, not monitoring.

What are the 4 mechanisms of context intelligence?

Four mechanisms: (1) Source Binding: every response ties to a versioned document with hash + last-verified timestamp; (2) Freshness Checks: automated reconciliation between knowledge base and canonical sources, hourly for high-stakes domains; (3) Context Validation: runtime check against current source before serving; (4) Contradiction Detection: cross-reference response against any linked documents.

Is the CA$812 damage award actually significant?

The CA$812 is a rounding error. The precedent isn't. The ruling established that companies are legally responsible for what their AI chatbots say, even contradicting their own policies, even with terms-of-service disclaimers. The tribunal rejected the "separate legal entity" argument. After Moffatt, every customer-facing AI deployment is on notice.

How does this apply to enterprises outside airlines?

Same failure mode across industries. NYC's MyCity chatbot (~$600K on Microsoft Azure) gave business owners advice that would violate labor, housing, and cash-acceptance law. DPD's UK chatbot started cursing at customers after a system update invalidated its behavioral guardrails. Different sectors, same root cause: agent operating on context inconsistent with reality, no runtime validation layer.

What is runtime governance and how is it different from moderation?

Moderation filters AI output after the fact (profanity, toxic content). Runtime governance sits between model and environment and enforces constraints at execution time: source binding, freshness checks, context validation, contradiction detection. Air Canada, MyCity, and DPD weren't caught by moderation because they weren't generating "bad" output. They were generating plausible, policy-inconsistent output. That requires governance, not moderation.

Related Resources

The 5 Agent Failure Modes: Context drift maps to Failure Mode 1: Context Blindness
The $47K Loop: A Complete Forensic Analysis: What happens when agent failures compound without bounds
The Prevention Stack: Beyond Observability: The architecture that prevents context drift and four other failure modes
260 McNuggets: When AI Orders for You: When the agent's action is wrong, not just the context

Sources

[1] Moffatt v. Air Canada, 2024 BCCRT 149: Full tribunal decision establishing company liability for AI chatbot statements. https://www.canlii.org/en/bc/bccrt/doc/2024/2024bccrt149/2024bccrt149.html

[2] ABA - Companies Remain Liable for AI Chatbot Information - Legal analysis of the Air Canada ruling. https://www.americanbar.org/groups/business_law/resources/business-law-today/2024-february/bc-tribunal-confirms-companies-remain-liable-information-provided-ai-chatbot/

[3] Hackaday - Air Canada's Chatbot: Why RAG Is Better Than an LLM for Facts Technical analysis of the chatbot architecture failure. https://hackaday.com/2024/02/28/air-canadas-chatbot-why-rag-is-better-than-an-llm-for-facts/

[4] Verdantix - What Software Vendors Can Learn from the Air Canada Chatbot Case - Industry analysis of the Air Canada incident. https://www.verdantix.com/insights/blog/what-can-industrial-software-vendors-learn-from-the-air-canada-chatbot-hallucination-case

[5] The Markup - NYC AI Chatbot Tells Businesses to Break the Law, March 2024 - NYC MyCity chatbot investigation. https://themarkup.org/news/2024/03/29/nycs-ai-chatbot-tells-businesses-to-break-the-law

[6] The Markup - Malfunctioning NYC AI Chatbot Still Active, April 2024 - Follow-up on NYC MyCity chatbot remaining active. https://themarkup.org/news/2024/04/02/malfunctioning-nyc-ai-chatbot-still-active-despite-widespread-evidence-its-encouraging-illegal-behavior

[7] The Markup - Mamdani to Kill the NYC AI Chatbot, January 2026 - NYC MyCity chatbot finally shut down. https://themarkup.org/artificial-intelligence/2026/01/30/mamdani-to-kill-the-nyc-ai-chatbot-we-caught-telling-businesses-to-break-the-law

[8] TIME - DPD Chatbot Curses and Criticizes Company, January 2024 - DPD chatbot guardrail failure. https://time.com/6564726/ai-chatbot-dpd-curses-criticizes-company/

[9] ITV News - DPD Disables AI Chatbot, January 2024 - DPD chatbot incident reporting. https://www.itv.com/news/2024-01-19/dpd-disables-ai-chatbot-after-customer-service-bot-appears-to-go-rogue

#incident-forensics