Blog · arXiv Analysis · Last reviewed July 2, 2026

The Skill Dependency Becomes the Supply Chain

Changguo Jia, Tianqi Zhao, Runzhi He, and Minghui Zhou's July 2026 arXiv paper treats reusable agent skills as supply-chain artifacts. The warning is direct: a skill that calls other skills, packages, and services is no longer just a local work instruction. It is a graph.

Artifact Shift

The paper, arXiv:2607.01136 [cs.SE, cs.AI], is Skills Are Not Islands: Measuring Dependency and Risk in Agent Skill Supply Chains. arXiv lists it as submitted on July 1, 2026. Its object is the reusable agent skill: usually a SKILL.md file with front matter, natural-language instructions, optional scripts, references, examples, package installation steps, tool calls, service assumptions, and sometimes pointers to other skills.

The paper's title is doing real work. A skill is often discussed as a self-contained procedure that an agent loads when needed. Jia, Zhao, He, and Zhou argue that this picture fails once skills begin reusing other skills, packages, and external services. The skill is then a dependency-bearing artifact, but its identity, version, source, and downstream dependencies are often implicit.

This fits directly beside this site's earlier pages on skills as work instructions, skill manifests as permission boundaries, skill detection surfaces, and skill runtime contracts. Those pages ask what a skill means, what it can do, how it is screened, and how it is enforced. This paper adds the graph question: what else arrives when the skill arrives?

What ASSC Adds

The authors introduce Agent Skill Supply Chains, or ASSCs. An ASSC is a directed dependency graph whose nodes are skills, software packages, and external services, and whose edges record dependency relationships. This is the skill-world analog of a software bill of materials, but ordinary package SBOM tooling is not enough because skills express dependency evidence across YAML metadata, instructions, scripts, examples, package commands, natural-language mentions, MCP references, APIs, webhooks, and cloned skill names.

The important distinction is channel typing. A skill dependency is not the same thing as an npm package dependency, and neither is the same thing as a service dependency on an MCP server or API endpoint. A governance system that collapses those into a vague list of "related things" loses the ability to ask the operational questions: which artifact is installed, which package version is pulled, which service authority is assumed, which source repository is trusted, and which transitive edge makes the root skill risky?

That is why the paper proposes SkillBOM, a skill-oriented bill-of-materials representation built around skill components while remaining compatible with SBOM-style toolchains. The point is not to make a prettier manifest. The point is to turn scattered dependency clues into a machine-readable record that can be audited and updated.

Analyzer Receipt

The measurement engine is SkillDepAnalyzer, or SDA. It parses skill structure, separates front matter from body text, extracts dependency clues, classifies confirmed dependencies into package, skill, and service channels, and preserves lower-confidence clues as annotations rather than silently turning every mention into a dependency edge. It then recursively resolves packages and skills to construct an ASSC and emits SkillBOM output.

The paper builds a human-labeled SKILL-DEP benchmark to test that this is not only a clever diagram. On the single-layer benchmark, SDA reports 0.95 overall dependency F1 and 1.00 accuracy on metadata fields. On the multi-layer benchmark, it also reports 0.95 F1 for whole skill dependency graphs. Package-centric SBOM generators do poorly because they are built to scan package manifests and source trees, not a mixed document where a dependency may be stated in prose, front matter, a script, or a service cue.

The useful governance lesson is the calibration step. A naive scanner that treats every named package, skill, service, or URL as a dependency will flood reviewers with examples and troubleshooting text. A naive LLM extractor will overread semantically related objects and underread path or workflow conventions. The paper's contribution is not that extraction becomes perfect; it is that skill supply-chain governance needs typed evidence, confidence, source location, and recursive expansion.

Scale Result

After validating SDA, the authors apply it to 1,434,046 GitHub-backed skill records from SkillsMP. The scale result is stark: skill front matter is usually present, and names and descriptions are common, but governance metadata is thin. The paper reports front matter in 99.55 percent of skills, names in 99.49 percent, descriptions in 99.52 percent, licenses in only 11.25 percent, versions in only 20.12 percent, and dependency-like fields in only 1.40 percent. Meanwhile, more than 30 percent of skills actually carry package, skill, or service-use dependencies.

The identifier problem is just as serious. The paper reports that 58.73 percent of skills have non-unique effective names. A registry that depends on plain names for resolution, provenance, or update notices is therefore brittle by design. The skill may be activation-ready for an agent, but it is governance-poor for a maintainer.

The dependency graph has four patterns. First, the metadata is rich enough to invoke skills but too sparse to govern them. Second, dependencies span distinct skill, package, and service channels, with reuse concentrated around a small set of hubs. Third, recursive skill reuse expands package inventory: among dependency-bearing skills, 22.42 percent gain packages only through reused skills, while 71.87 percent of npm package exposures and 73.33 percent of PyPI package exposures are inherited through skill reuse rather than direct declaration. Fourth, dependency clusters form around related workflows, with 30.41 percent of dependency-bearing root skills containing at least one cycle.

Security Propagation

The security section is the strongest reason to care about the graph. The paper scans for public malicious-skill indicators, package-security signals, and vulnerable or malicious service indicators. Its central finding is that root-level review misses a large share of the interesting signal because the signal is inherited.

For skill-level security patterns, the paper reports that 60 to 78 percent of affected roots carry several regex-based security-relevant families only through transitive dependencies. For package exposure, 98.01 percent of roots carrying axios and 65.43 percent carrying nx inherit those package signals through dependencies. For service exposure, 93.10 percent of roots reaching reported vulnerable MCP services inherit them transitively.

The concrete examples matter because they show persistence rather than only theoretical reachability. The paper says it found copies of known malicious clawhub1 and clawbhub skills in a repository dependency collection and reported them to developers. It also describes unpinned install instructions, where downstream skills can remain exposed to whatever package version is served locally. This is the agent-skill version of the old supply-chain lesson: the vulnerable thing may not be in the package you thought you reviewed.

Governance Standard

A skill registry should require typed, multi-channel dependency manifests. The minimum record should distinguish skillDependencies, packageDependencies, and serviceDependencies, and should preserve source repository, path, version, hash, package manager, service endpoint or authority class, evidence location, optional status, and review disposition.

Skill package managers also need first-class dependency-cluster handling. When skills mutually depend on one another as pieces of one workflow, installing, updating, reviewing, quarantining, or revoking one skill may require coordinated action across the cluster. A single-skill approval page is too small for that operational unit.

The paper's audit-command proposal should be treated as basic infrastructure. A skill audit command should resolve the full graph and report known malicious skills, vulnerable or unpinned packages, risky MCP or API services, inherited authority surfaces, missing versions, stale sources, name collisions, and dependency-only exposures. Developers should also keep lockfile-like records so downstream users can reproduce the dependency graph that was actually reviewed.

The Spiralist reading is simple: reusable procedure becomes delegated authority, and delegated authority becomes supply-chain state. A skill is not safe because its root file looks benign. It is safe only to the extent that its dependency graph, service surface, package inventory, versions, provenance, review status, and runtime boundary are visible together.

Claim Boundary

The paper is a measurement and infrastructure paper, not a complete governance regime. SDA is both the tool and the measurement instrument for the large-scale study, so extraction errors can affect reported structure. The authors mitigate this with SKILL-DEP, confidence handling, and manual inspection, but the dataset is still GitHub-backed public SkillsMP content as of June 2026, not a complete view of private enterprise skills or every future registry.

The security findings are also audit signals, not automatic vulnerability proofs for every reached root. The right lesson is not to ban skill reuse. The right lesson is to stop pretending that skill reuse is lightweight when it quietly imports packages, services, clusters, and malicious or stale artifacts through edges the root skill never declares.

Sources


Return to Blog