The Skill Scanner Becomes the Detonation Harness
Agent skills turn capability sharing into a software supply chain. A scanner that only inspects the install-time bundle cannot see every behavior the agent may later materialize.
The Paper
The paper is Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware, arXiv:2607.02357 [cs.CR, cs.SE]. The arXiv record lists Zimo Ji, Congying Xu, Zongjie Li, Yudong Gao, Xin Wei, Shuai Wang, and Shing-Chi Cheung as authors and records version 1 as submitted on July 2, 2026. The PDF gives affiliations at the Hong Kong University of Science and Technology and the Guangzhou HKUST Fok Ying Tung Research Institute, China.
The paper belongs next to the site's pages on skill supply chains, tool boundaries, and agent receipts, but its angle is different. It does not merely say that third-party skills can be risky. It tests whether the scanners used to approve those skills still work when the malicious behavior is preserved but the visible packaging changes.
The Skill Surface
An agent skill is a portable bundle: natural-language instructions, scripts, resources, and sometimes setup steps that a coding agent can load and execute. Once installed, it may run with the agent's access to the local workspace, files, credentials, package managers, terminal, and network. That makes a skill marketplace closer to a package repository than a prompt library.
The paper frames malicious skills as a supply-chain problem. A poisoned bundle can be visible in SKILL.md, split between prose and scripts, or staged through resources that are only used during execution. Static skill scanners answer one question: does the bundle look dangerous before the agent runs it? The paper's point is that this is not the same as asking what the bundle causes the agent to do.
The Cloak
The authors study eight representative skill scanners and two production agents, Codex and Claude Code, using 1,613 in-the-wild malicious skills from the ClawHub/OpenClaw skills archive. Their evasion framework, SkillCloak, keeps the malicious behavior intact while changing how the payload is presented to scanners. At a high level, one strategy rewrites visible indicators into semantically equivalent forms. The other hides malicious components from the install-time view and restores them during agent execution.
The results are the important part, not the mechanics. Self-extracting skill packing bypasses every surveyed scanner at more than 90 percent. Structural obfuscation exceeds 80 percent bypass on most static scanners and reaches 96 percent on a hybrid scanner. The paper also tests whether cloaking ruins utility. Across GPT-5.4 via Codex and Claude Sonnet 4.6 via Claude Code, it reports no statistically detectable utility degradation, with execution trajectories remaining highly similar to the raw skills.
The Detonation Harness
The proposed defense is SkillDetonate, a behavior-centric runtime auditor. Instead of judging the install-time appearance of a skill, it executes the skill in a controlled sandbox and watches for security-relevant effects at operating-system boundaries. The design combines on-demand closure lift, which observes instructions materialized during execution, with marker-based taint analysis, which follows sensitive-data flow across agent context, files, processes, and network operations.
On the SkillJect benchmark, the paper reports that SkillDetonate detects 96.7 percent of attacks at a 2 percent false-positive rate. On executable wild malicious skills from MalSkillBench, it detects 87.3 percent on average across original, obfuscated, and packed forms. In the same wild setting, the paper reports that Cisco's scanner drops from 98.6 percent detection to 10.1 percent under structural obfuscation. The contrast is the governance lesson: byte inspection is hygiene, not a complete trust gate.
Runtime Receipt
A skill approval record should include more than a scanner verdict. It should record the skill hash, declared capability, file tree, setup commands, ignored paths, scanner version, rule set, LLM-judge prompt if any, sandbox image, allowed network egress, mounted secrets, taint markers, observed file reads and writes, process tree, network destinations, detected source-to-sink flows, runtime timeout, and whether the skill materialized new instructions during execution.
That receipt changes the governance object. A marketplace listing says a skill exists. A static scan says the install-time view matched or did not match known warning patterns. A detonation receipt says what happened when an agent actually interpreted the bundle. For agent skills, that third record is where trust becomes inspectable.
Limits
The paper is also explicit about limits. Dynamic auditing cannot guarantee that every adversarially gated behavior has been observed. A skill can depend on live endpoints, missing accounts, API keys, or branch conditions that a sandbox does not reproduce. Some missed wild skills did not fire because the execution path was not reached, the environment was unavailable, or the run hit a timeout. Anti-sandbox behavior remains a live problem.
Those limits make the case for layered control rather than a single perfect scanner. Static checks remain useful for hygiene, but they should be paired with runtime observation, integrity checks between scan and execution, realistic sandbox environments, repeated runs, and policy that treats unobserved behavior as uncertainty rather than clearance.
Sources
- Zimo Ji, Congying Xu, Zongjie Li, Yudong Gao, Xin Wei, Shuai Wang, and Shing-Chi Cheung, Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware, arXiv:2607.02357 [cs.CR, cs.SE].
- arXiv HTML for Cloak and Detonate, checked for abstract, scanner-evasion setup, SkillCloak and SkillDetonate design, evaluation numbers, discussion, and limitations.
- arXiv PDF for Cloak and Detonate, checked against title-page metadata, affiliations, dataset descriptions, benchmark results, ablations, and limitation statements.