Blog · arXiv Analysis · Published: June 25, 2026

The Agent Dependency Graph Becomes the Audit Map

Shenao Wang, Xinyi Hou, Yanjie Zhao, Xiao Cheng, and Haoyu Wang's AgentFlow paper treats an agent repository as more than ordinary code: prompts, tools, memory, handoffs, and policies form a dependency graph that auditors can inspect.

The Paper

The paper is AgentFlow: Building Agent Dependency Graphs for Static Analysis of Agent Programs, arXiv:2607.01640 [cs.SE, cs.CR]. The arXiv record lists the authors as Shenao Wang, Xinyi Hou, Yanjie Zhao, Xiao Cheng, and Haoyu Wang, and records v1 on July 2 2026. The object is not a chatbot product or a low-code agent store. The paper studies source-code applications built with agent frameworks.

That scope matters. A normal repository view shows imports, functions, classes, calls, assignments, tests, and configuration files. An agent framework adds another layer of meaning: which prompt configures which agent, which model is bound to that agent, which tool can be invoked, which memory can influence later turns, which handoff can pass control, and which policy can approve or block an action. AgentFlow asks static analysis to recover those agent-specific relationships before the system is deployed, benchmarked, sold, or trusted.

Agent Dependencies

The useful term in the paper is agent dependency. An agent constructor may look like an ordinary host-language call, but the framework can interpret its arguments as a prompt, a model binding, a tool list, a memory object, or a delegation path. A tool decorator may turn a Python function into an invocable capability. A hosted tool object, session object, workflow command, or handoff declaration can create a path that ordinary call-graph review would not name.

For governance, the point is simple: the risk is not only "can this function call that function?" The risk is also "can untrusted or weakly reviewed text move through a prompt, memory state, agent decision, or handoff and reach a privileged capability?" That question cannot be answered from package inventory alone. It needs a map of the framework semantics that make the application an agent program.

The Graph

AgentFlow's intermediate representation is the Agent Dependency Graph, or ADG. The ADG represents agents, prompts, models, capabilities, memory states, and control policies as typed nodes. It represents component-dependency, control-flow, and data-flow relationships as typed edges. The graph is not a transcript of one concrete run. It over-approximates the dependencies permitted by the source code and by the framework semantics.

The implementation covers five frameworks: OpenAI Agents SDK, LangChain/LangGraph, CrewAI, LlamaIndex, and Semantic Kernel. The paper reports 143 modeled framework constructs, a Python analyzer, a TypeScript checker, and a framework registry. For evaluation, the authors built AgentZoo, a corpus of 5,399 GitHub Python projects that import and use those frameworks after filtering out forks, archived repositories, and repositories without recognized agent-framework constructs.

This is a valuable shift in audit posture. Instead of asking an agent developer to narrate architecture from memory, the reviewer can ask for the graph. The graph can still be incomplete or over-broad, but it turns a vague "our agent has tools and memory" claim into inspectable nodes, edges, and missing edges.

BOM and Risk

The paper demonstrates two applications. The first is Agent Bill of Materials generation. In the BOM-Eval sample, AgentFlow successfully analyzed 98 of 100 projects and generated 2,295 components with 1,008 binding relationships. The important word is relationships. An agent inventory that lists packages, models, or tools but omits bindings is not enough to explain which agent can use which capability, prompt, model, memory state, or other agent.

The second application is prompt-to-tool risk detection. Across 5,399 analyzed projects, AgentFlow identified 238 projects with prompt-to-tool risks, accounting for 4.4 percent of the corpus, with 4,357 findings. The paper classifies reached side effects at a high level: external sending, file access, administrative mutation, SQL query, and command or code execution. It also reports a manual audit of 100 sampled reports: 73 were judged true prompt-to-tool vulnerabilities, while many false positives still had valid taint paths but less severe sink semantics. A separate sample of non-reported privileged-capability agents found 9 missed risks.

Those numbers should be read as triage evidence, not as a public accusation against each repository. Static analysis is useful because it finds possible paths before an incident. It also needs human review, sink semantics, dynamic validation, and a clear claim boundary.

Audit Map

An ADG receipt for an agent system should travel with serious deployments. At minimum it should name the repository snapshot, analyzer version, framework versions, framework-registry version, detected agents, prompts, models, capabilities, memory states, control policies, typed edges, entry points, external services, shared memory, handoff paths, human approval gates, prompt-to-tool paths, and unresolved dynamic behavior.

For procurement, incident review, compliance, or internal launch approval, "show the dependency graph" is a better question than "is there a tool list?" A tool list says what exists. A dependency graph says what can reach what, under which framework abstraction, and where a control point is supposed to intervene. That does not solve agent safety. It makes the system less dependent on verbal assurances.

For Spiralism, the interesting cultural pattern is that agency keeps being sold at the surface while dependency remains hidden underneath. AgentFlow pushes the record downward into the code-shaped machinery: the prompt that binds the agent, the memory that carries context, the capability that changes the world, and the policy that may or may not stop it.

Claim Boundary

The paper is not proof that every flagged project is exploitable. It says AgentFlow over-approximates control and data dependencies, which can introduce false positives. It is also focused on Python programs and five frameworks, is less effective for project-specific agent structures that do not follow recognizable framework APIs, and needs registry updates as frameworks change. ADG analysis captures high-level framework semantics and does not replace lower-level program dependency analysis inside tool implementations or libraries.

That boundary is the reason the paper is useful. It does not ask readers to believe that a static graph fully understands an agent. It asks reviewers to stop treating agent repositories as ordinary code when the operational semantics live partly in framework conventions. Before the demo, before the benchmark, and before the policy memo, there should be a graph of what the agent can touch.

Sources

Shenao Wang, Xinyi Hou, Yanjie Zhao, Xiao Cheng, and Haoyu Wang, AgentFlow: Building Agent Dependency Graphs for Static Analysis of Agent Programs, arXiv:2607.01640 [cs.SE, cs.CR].
arXiv HTML for AgentFlow, checked for abstract, scope, ADG representation, evaluation setup, AgentZoo, Agent BOM results, prompt-to-tool results, scalability, discussion, and limitations.
arXiv PDF for AgentFlow, checked against the same metadata, result tables, manual-validation notes, and limitation claims.

Return to Blog