Blog · arXiv Analysis · June 25, 2026

The Risk Taxonomy Becomes the Audit Spine

Gemma Galdon Clavell, Pablo Accuosto, and Usman Gohar's July 2026 arXiv paper presents the Eticas AI Risk Taxonomy v2.0.0 as open infrastructure for operationalizing AI audits.

For this essay, an audit-spine receipt is the record that carries a named AI risk from taxonomy identifier to mechanism, probe, metric value, severity band, grade, evidence source, framework mapping, and disclosed limitation.

The Claim

The paper, arXiv:2607.02201 [cs.CY], was submitted on July 2, 2026 under the title The Eticas AI Risk Taxonomy: Open Infrastructure for Operationalizing AI Audits. Its central complaint is practical: the AI governance field has many risk taxonomies, but most taxonomies stop at naming risks rather than showing how an audit is executed.

The authors say at least 74 AI risk taxonomies exist. Their proposed answer is not another glossary alone. It is a vocabulary tied to an operational chain: risk, mechanism, probe, metric, severity, grade, and reportable finding.

The Operational Layer

The Eticas taxonomy v2.0.0 is described as 10 top-level categories, 20 sub-groups, and 76 active subcategories. The paper says each entry carries a stable identifier, definition, lifecycle-stage annotation, mechanism list, and mappings to 18 external frameworks across compliance, reference, and academic tiers.

The important move is the separation between a risk and the mechanisms by which it surfaces. A risk such as PII leakage is not treated as one undifferentiated label. It is decomposed into mechanisms such as disclosure, memorization, and cross-customer contamination. Tests can then attach to mechanism identifiers rather than to vague risk names.

The paper also describes a four-layer methodology: reusable foundations, technology-specific core methods, sector annexes, and project-specific audit instantiations. In other words, an auditor should not reinvent what PII leakage means in every engagement, but the auditor still has to execute checks against the actual system under review.

The PII Example

The worked example is PII leakage. The paper maps that risk to seven external frameworks and follows the disclosure mechanism through DecodingTrust Privacy Scenario 2 on GPT-4-0314. Under increasing adversarial conditioning, the paper reports disclosure rates of 0 percent, 51 percent, and 84 percent. Those rates map to severities 1, 4, and 5, producing a subcategory grade of E with a SYSTEMIC pattern flag.

The point is not that GPT-4-0314 is today's deployed system. The paper frames the result as a public-benchmark proxy rather than a client engagement. The useful contribution is the chain: the same named risk can be traced through probes, values, thresholds, aggregation rules, and a final grade.

Semantic Infrastructure

The public taxonomy surface is published under CC BY 4.0 with stable URIs, SKOS/Turtle, and JSON-LD distributions at taxonomy.eticas.ai. That matters because audit vocabulary becomes machine-readable infrastructure rather than PDF-only compliance language. A regulator, auditor, buyer, or tool can point to a concept URI and know what risk vocabulary is being invoked.

This is where a taxonomy becomes a spine. A finding, model card, procurement review, or incident record can cite the same concept rather than inventing local labels. That does not make the audit correct, but it makes disagreement easier to locate: at the concept, mechanism, probe, metric, threshold, or grade.

The Agentic Gap

The paper treats Agentic AI as a first-class established category. It argues that agentic systems decompose goals, invoke tools, persist state, adapt plans, and coordinate with other agents in ways that differ from passive prediction systems. The authors say the framework mappings show a structural gap: major regulations and management-system standards predate agentic AI as a deployment paradigm, so agent-specific frameworks are needed to supplement general compliance.

Governance Reading

The Spiralist reading is that audit language has to become executable without pretending that execution is neutral. A taxonomy can discipline vague governance talk, but the real judgment lives in test design, access level, calibration, aggregation, and what remains proprietary.

That makes the paper valuable and contestable. It pushes beyond risk-name theater by showing a concrete measurement-to-grade chain. It also asks readers to accept an open-core boundary: the conceptual scaffold and worked examples are public, while the full subcategory set, methodology repository, and engagement data are not publicly available.

Audit-Spine Receipts

An audit-spine receipt should name the taxonomy version, concept URI, maturity status, external framework mappings, mechanism identifier, probe design, benchmark or live-data source, model or system version, metric definition, raw value, severity band, aggregation rule, final grade, pattern flag, and reviewer judgment.

It should also record empty slots. Which mechanisms are known but not operationalized? Which subcategories are emerging or internal-only? Which framework matches are exact, close, broad, narrow, or merely related? Which calibration rules are public, and which live in a practitioner layer? Those gaps are part of the audit record, not embarrassing footnotes.

Limits

The paper discloses that the authors are affiliated with Eticas, which provides commercial AI audit services and maintains proprietary methodology assets. That does not invalidate the contribution, but it changes how the reader should use it. The public taxonomy is evidence of a reusable vocabulary and worked pattern, not proof that any particular private audit is sound.

The page therefore treats the taxonomy as governance infrastructure, not as a certificate. An organization still has to show system access, test coverage, affected populations, data provenance, human review, remediation path, and post-deployment monitoring before an audit grade deserves operational trust.

Source Discipline

This page uses the arXiv abstract, arXiv HTML paper, arXiv API metadata, and the public taxonomy site as primary sources for title, authorship, submission date, taxonomy version, category counts, mappings, PII example, license claims, agentic-AI category, and disclosed limitations. It does not independently validate the Eticas methodology, rerun DecodingTrust, or inspect proprietary audit materials.

AI Governance, AI Audit Trails, AI Evaluations, Algorithmic Impact Assessments, EU AI Act, NIST AI Risk Management Framework, and ISO/IEC 42001 cover the surrounding governance frame.
The Evaluation Schema Becomes the Public Ledger, The AI Audit Becomes the Compliance Interface, The Agent Action Becomes the Legal Perimeter, and The Agentic Model Becomes the Validation Problem give adjacent audit patterns.

Sources

Gemma Galdon Clavell, Pablo Accuosto, and Usman Gohar, The Eticas AI Risk Taxonomy: Open Infrastructure for Operationalizing AI Audits, arXiv:2607.02201 [cs.CY], submitted July 2, 2026.
arXiv HTML: arXiv:2607.02201 HTML, reviewed for the operationalization layer, PII worked example, taxonomy design principles, framework alignment analysis, agentic-AI gap, conflict-of-interest statement, and data-availability statement.
Paper PDF: arXiv:2607.02201 PDF.
Public taxonomy: Eticas AI Risk Taxonomy, checked for the public taxonomy surface and resolvable source link.

Return to Blog