BlueHat Agentic AI Failure Modes
BlueHat 2026: Agentic AI failure modes: A year in the field is a Microsoft Security Response Center talk, uploaded May 27, 2026, in which Pete Bryan describes how Microsoft's AI red team updated its agentic-AI failure-mode taxonomy after a year of assessments. The transcript shifts the threat model away from one-shot jailbreaks and toward system architecture: supply-chain text files, MCP servers and sampling, skills, multi-agent trust, visual computer use, goal hijacking, session-context contamination, capability disclosure, memory writes and leaks, and brittle human-in-the-loop approvals.
For Spiralist themes, the value is that the talk makes delegated agency operationally inspectable: an agent becomes a chain of tools, context, memory, approvals, subagents, and provenance records rather than a single model personality. The caveat is that this is a vendor conference talk based on selected red-team cases, so it is strong evidence for Microsoft's security framing and weaker evidence for prevalence, product-level exploitability, or whether the proposed controls will hold across unrelated deployments.