You’re viewing the JavaScript-off version of this page (message previews and Quick Look turn JavaScript off). Each failure mode below shows its first card, with the rest behind a “Show the other three cards” toggle. If the toggle doesn’t respond in your viewer, open this file in a web browser — that’s also where the full tabbed experience lives.
An instruction that sounds specific has several legitimate readings — and the model picks one silently.
The companion nobody warns you about: a perfectly clear spec still drifts on a long run.
Asked for a fixed shape, the model invents a slightly better one.
A number nobody computed, stated fluently.
The story and the evidence beside it disagree — or no one can see whether they agree.
Individually correct queries combine into nonsense. The join is where correctness dies.
The data is right. The sentence makes the reader compute something false.
No layer proved correctness — and the deeper the orchestration, the harder to localize.
A step quietly dropped — output still looks complete. Success by appearance.
The telemetry itself silently fails. The run looks clean because nothing was watching.
A stop condition keyed to a signal that never fires. The loop looks patient; it’s actually deaf.
Paying a language model to be a for-loop.
Inline drifts. Full fan-out goes blind. The shape is a per-step decision.
Loading everything up front performs worse than loading nothing.
Long sessions summarize themselves. Summaries round. Downstream builds on the rounding.
A lesson you can’t find is a lesson you re-learn. Even the system you built to prevent this.
A second pass from the same model is polish, not verification.
A declared capability nothing verifies is a lie waiting to ship.
A gate built ahead of need rots into theater — and its presence implies false coverage.
After writing this post, I got curious whether anyone else was cataloguing these failures, so I went looking. I had no prior familiarity with this research, but two maps stood out.
Fourteen multi-agent failure modes, derived from 1,600+ annotated execution traces. arXiv March 2025; published at NeurIPS, December 2025.
Failure modes of agentic systems under attack, from the Microsoft AI Red Team. Whitepaper April 2025; updated June 4, 2026.
A few of the modes above I haven’t found on either map — the evidence-surface fork (B5), the corrupting join (B6), the correct number that reads false (B7). Not because they’re exotic: Berkeley measures benchmarks, Microsoft measures attack surfaces, and this catalog measures output that has to be right in front of a paying customer. It isn’t the exhaustive map either. It’s the map of where I’ve actually been.