Blog · arXiv Analysis · Last reviewed June 24, 2026

The Coding Agent Becomes the Commit Fingerprint

The June 2026 arXiv paper Detecting AI Coding Agents in Open Source, by Arsham Khosravani and Audris Mockus, treats coding-agent adoption as a measurement problem: the agent is often present, but the repository may not say so in one obvious place.

Trace, Not Confession

The paper, arXiv:2606.24429 [cs.SE], was submitted on June 23, 2026. Its premise is practical: AI coding agents have entered open-source work, but their traces are inconsistent. Some tools use bot accounts. Some leave commit-message signatures. Some appear through human author-name patterns. Some leave only project configuration files. A repository can therefore contain agent-mediated work without looking like a bot-authored repository.

This is a different angle from coding agents as maintainers, machine contributors as maintainer workload, and pull requests as prompt-injection surfaces. Those pieces ask how agent-authored changes should be reviewed and secured. Khosravani and Mockus ask a prior question: can we even measure where the agents are?

Four Signals

The authors build a multi-layered detector over World of Code snapshots spanning December 2024 through April 2026 and covering more than 180 million Git repositories. The framework classifies agent traces into four behavioral types: centralized bot accounts, commit-message signatures, distributed human attribution patterns, and configuration-file-only presence.

That taxonomy is the paper's institutional contribution. A bot account is an explicit actor. A signature is a receipt embedded in a commit. A human-name suffix is a developer-mediated attribution convention. A configuration file is weaker evidence: it may show that a project is prepared for a tool even when individual commits do not identify agent use. Treating those as one undifferentiated "AI commit" category would make the measurement look cleaner than the evidence permits.

The Undercount

The sharpest result is a warning about single-signal studies. In the V2510 snapshot, multi-method detection identifies 850,157 Claude Code commits. Bot-account lookup alone recovers 28,154 of them, or 3.3 percent. The paper frames that as a 30x relative-recall gap, with the union still a lower bound rather than a perfect census.

The scale matters. Across later snapshots, the authors report commit-attributed agents generating more than 320,000 commits per month by V2604. Claude Code leads the commit-based count with 886,122 commits across 17,295 projects. The configuration-file census also finds Claude-related configuration files in 21,078 projects, represented by 888,177 blob occurrences. The paper hand-validates 495 labels and reports per-cell precision with Wilson confidence intervals, which is a useful restraint against turning heuristic detection into certainty.

The Channel Bias

The comparison with AIDev is the second governance lesson. AIDev is a pull-request dataset of agent-authored PRs. Khosravani and Mockus compare that PR channel with their commit channel and find that they see different populations. Codex dominates the AIDev PR count but leaves few commit-level traces in this paper's detector. Claude Code dominates the commit channel but is sparse in the PR channel. The paper says a PR census misses 79 percent of commit-detected Claude Code adopters, while a commit-based census misses essentially all Codex adopters.

The channel also changes the story of what agents do. Using AIDev task labels, the paper reports that cloud agents surfaced through PRs skew toward feature work, while in-editor or direct-commit agents surface as maintenance. That does not prove one tool is inherently a feature builder and another is inherently a janitor. It shows that deployment mode and detection channel shape the measured work profile.

Limits That Matter

The paper is still an arXiv preprint. It measures traces, not all actual agent use. A configuration file can exist without active use. A silent IDE assistant can influence code without leaving a commit signature. A squash merge can erase provenance. A human can copy agent output into a normal commit. Detection is therefore a lower-bound governance instrument, not a confession machine.

The authors also distinguish agent-touched from fully autonomous agent-authored work. In their validation, most Type B true positives are primary agent-authored commits, but a smaller trailer category represents human-authored commits with agent co-authorship. For legal, maintenance, and security review, that distinction matters. A commit with an agent trail may deserve extra provenance, but it should not automatically be treated as a machine-only act.

The most important limit is social. Even excellent detection can only show traces that tools, platforms, and maintainers choose to emit. If agent attribution becomes reputationally or legally costly, incentives may push toward quieter workflows. Measurement will then depend on standards, norms, and procurement requirements, not only better regexes.

Governance Standard

Open-source projects should treat coding-agent attribution as supply-chain metadata. A useful project standard would require agent name, version or service, deployment mode, human reviewer, prompt or task summary when appropriate, generated-file scope, and whether the agent opened a PR, committed directly, or only assisted a human.

Platform-level standards should make that metadata structured rather than decorative. Commit trailers, PR fields, provenance attestations, and project-level configuration files should be machine-readable and auditable without forcing maintainers to guess from style. The practical rule is simple: if an agent helped change the code, the repository should preserve a trace that future maintainers can search, contest, and evaluate.

Sources

Arsham Khosravani and Audris Mockus, Detecting AI Coding Agents in Open Source: A Validated Multi-Method Census of 180 Million Repositories, arXiv:2606.24429 [cs.SE], submitted June 23, 2026.
arXiv PDF for Detecting AI Coding Agents in Open Source, reviewed June 24, 2026.
Hao Li, Haoxiang Zhang, and Ahmed E. Hassan, AIDev: Studying AI Coding Agents on GitHub, arXiv:2602.09185 [cs.SE], February 2026.
Related pages: The Coding Agent Becomes the Maintainer, The Machine Contributor Becomes the Maintainer Tax, The Pull Request Becomes the Prompt Injector, The Agent Trace Becomes the Process Map, and Vibe Coding.

Return to Blog