YouTube Review

Agentic ProbLLMs: Exploiting AI Computer-Use and Coding Agents

39C3 - Agentic ProbLLMs: Exploiting AI Computer-Use and Coding Agents is Johann Rehberger's Chaos Computer Club talk on what changes when language models do not only answer, but click, read files, call tools, edit code, run commands, and operate computers. The official CCC page frames the talk around end-to-end prompt-injection exploits against computer-use and coding agents, including data exfiltration, remote code execution, command-and-control style compromise, and long-term prompt-injection persistence.

The useful model is Rehberger's "AI kill chain": indirect prompt injection, confused deputy behavior, and automatic tool invocation. The transcript walks through agents reading hostile web pages, issue text, code comments, filenames, hidden Unicode instructions, and other untrusted context, then converting that context into tool use. The examples span Claude Computer Use, Devin, Claude Code, Amazon Q, Google Jules, Google Antigravity, GitHub Copilot, and multi-agent coding workflows. The strongest lesson is not any one product bug. It is that agentic systems make untrusted text operational: a page, ticket, repository, or dependency can become an instruction source for a system that has real tools.

That belongs beside Prompt Injection, AI Agents, AI Browsers and Computer Use, AI Coding Agents, Agent Tool Permission Protocol, and Agent Audit and Incident Review. The practical security principle is blunt: prompt instructions are not security controls. Rehberger calls out "prompt begging" because telling a model not to leak data is not the same as preventing a tool call from leaking data. Real controls sit downstream of model output: sandboxing, scoped credentials, no ambient secrets, deterministic command gates, constrained tool permissions, meaningful human approval, memory provenance, output validation, and logs that let responders reconstruct what happened.

The surrounding evidence supports the frame. Rehberger's Trust No AI paper argues that prompt injection can undermine confidentiality, integrity, and availability across real LLM applications. OWASP's LLM01:2025 Prompt Injection page explicitly includes direct and indirect prompt injection, sensitive-information disclosure, unauthorized access, arbitrary command execution in connected systems, and critical-decision manipulation as possible outcomes. OWASP's older LLM Top 10 also names insecure output handling, plugin design, excessive agency, and overreliance as adjacent risks. The talk is therefore best read as a live demonstration of why agent security cannot be reduced to better prompts.

Evidence and limits: this is a research and conference talk built from selected exploit demonstrations and responsible-disclosure work, not a prevalence study of all coding agents or a current guarantee that every named product remains vulnerable in the same way. Some issues described in the transcript were patched, some mitigations depend on deployment choices, and product behavior changes quickly. The durable contribution is the threat model: assume the model can be influenced by untrusted context; assume breach of the agent; ask what the agent can read, write, execute, remember, and send; then place enforceable controls below the model rather than inside its wishful instructions.

Return to YouTube