Wiki · Concept · Last reviewed June 14, 2026

AI Audits and Third-Party Assurance

AI audits are evidence-producing reviews of an AI system, organization, or deployment context used to test claims about risk, compliance, performance, and accountability. Third-party assurance adds a further claim: that an outside actor has examined the evidence rather than leaving the builder to grade itself.

Definition

An AI audit is a disciplined examination of an AI system, its development process, its deployment environment, or its organizational controls. It may ask whether a system works as claimed, whether risks have been identified and mitigated, whether legal duties have been met, whether affected people have paths to challenge outcomes, and whether records exist for later investigation.

The audit object matters. A model audit may test a model under controlled conditions. A system audit may include prompts, retrieval, tools, user interface, monitoring, vendor contracts, human workflow, and update policy. An organizational audit may examine whether the institution has durable controls for governing many AI systems over time.

The word "audit" is used loosely in AI. It can mean internal governance review, external red teaming, bias testing, cybersecurity review, documentation inspection, data governance assessment, conformity assessment, procurement due diligence, incident investigation, or formal certification against a management-system standard.

Third-party assurance is narrower. It requires some separation between the audited organization and the reviewer. Independence can be strong, weak, or compromised depending on who pays, what access is granted, what can be disclosed, whether conflicts are visible, and whether the auditor can report negative findings without retaliation.

A serious assurance claim should therefore name its scope, audience, criteria, evidence base, date, system version, limitations, and consequence. "Audited" is weak unless readers can tell what was audited, by whom, against what standard, with what access, and what changed after findings.

Why It Matters

AI systems now make or influence decisions in settings where ordinary users cannot inspect the model, data, logs, vendor contracts, evaluation failures, or incident history. Without audit rights and audit evidence, institutional claims of safety become difficult to distinguish from marketing.

Audits are also a bridge between technical evaluation and public accountability. A benchmark score can say something about model behavior under test. An audit can ask a wider question: whether the organization has a repeatable process for knowing what it built, where it is used, who is affected, how it fails, and what changes after failure.

For powerful AI systems, the audit layer becomes political infrastructure. It determines who gets to see inside the machine age: only vendors, selected customers, regulators, courts, researchers, civil society, affected communities, or the public.

Types of AI Audit

Internal audit. The organization reviews its own AI development, deployment, or use. This can be useful for continuous governance, but it carries the obvious risk of self-protection and selective attention.

Second-party audit. A customer, contractor, investor, insurer, or platform partner examines the system or requests evidence. This can create real pressure but may still be shaped by commercial dependence.

Third-party audit. An outside reviewer examines the system under a defined scope. The value depends on independence, competence, access, liability, publication rights, and whether findings can alter deployment.

Regulatory inspection. A public authority or legally empowered body reviews compliance, evidence, and controls. This may include powers unavailable to ordinary researchers, such as compulsory information requests.

Public-interest audit. Researchers, journalists, civil society groups, or affected communities test systems from the outside. These audits can reveal harms hidden by vendors, but often lack access to logs, source material, and internal decision records.

Certification audit. An assessor reviews whether an organization conforms to a standard or management system, such as an AI management system. Certification is not the same thing as proving that every deployed model is safe.

Audit Evidence

A credible AI audit needs evidence that survives beyond a slide deck. Relevant evidence can include model cards, system cards, risk registers, evaluation results, red-team findings, data provenance records, training and fine-tuning summaries, access-control records, logging policy, incident reports, override records, post-market monitoring, procurement materials, user notices, appeal records, and governance meeting decisions.

For agentic systems, audit evidence should include tool permissions, action traces, retrieved content, prompt and policy versions, sandbox boundaries, credential use, human approvals, rollback records, and exceptions. Without runtime evidence, agent governance becomes mostly retrospective storytelling.

Audit evidence should also include negative evidence: failed tests, excluded use cases, rejected mitigations, unresolved limitations, known blind spots, and conditions under which the system must not be used.

Source discipline matters inside the audit itself. Evidence should distinguish primary records from vendor summaries, reproducible tests from one-off demonstrations, pre-release results from deployed-system monitoring, and public disclosures from regulator-only or customer-only material. The audit should preserve enough metadata for later review: dates, model and dataset versions, prompts or test cases where disclosure is safe, sampling method, evaluator qualifications, conflicts of interest, and unresolved uncertainty.

Governance Implications

Audit rights allocate power. If only the vendor can inspect the system, assurance becomes self-certification. If customers, regulators, courts, researchers, insurers, workers, or affected communities receive defined evidence rights, the system becomes harder to govern by trust alone.

Procurement and regulation should treat audits as decision tools, not decorative artifacts. A useful audit can trigger remediation, deployment delay, contract conditions, monitoring duties, incident reporting, user notice, public disclosure, or withdrawal. If no one can act on the finding, the audit is evidence without leverage.

High-impact AI also needs continuity. A point-in-time report cannot cover later model updates, changed prompts, new retrieval sources, new user populations, added tools, or altered business incentives. Governance should specify when re-audit is required, which changes must be logged, and who can suspend use while evidence is incomplete.

NIST AI RMF and TEVV. NIST describes the AI Risk Management Framework as a voluntary framework for improving the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems. It is not itself an audit law, but it supplies a common structure for governance, mapping, measurement, and management. NIST's test, evaluation, validation, and verification work is relevant because audits need valid measurement practices, not only policy language.

ISO/IEC 42001 and 42006. ISO/IEC 42001:2023 specifies requirements for establishing, implementing, maintaining, and continually improving an Artificial Intelligence Management System. ISO/IEC 42006:2025 adds requirements for bodies that audit and certify AI management systems against ISO/IEC 42001. Together they move assurance from one-off testing toward documented organizational processes, auditor competence, consistent certification practice, and continual improvement. They do not prove that every product or model within a certified organization is safe.

ISO/IEC 42005. ISO/IEC 42005:2025 provides guidance for AI system impact assessments. It is adjacent to audit because impact assessment asks organizations to identify and document foreseeable effects on individuals, groups, and society before and during use.

EU AI Act conformity assessment. The EU AI Act requires conformity assessment for high-risk AI systems. Article 43 distinguishes several assessment routes: internal control for many Annex III high-risk systems, notified-body involvement for some biometric systems and conditions, and integration with existing product-safety conformity assessment for regulated products. High-risk AI systems that are substantially modified must undergo a new conformity assessment unless the change was predetermined and documented. This means the legal meaning of "assessment" varies by system type, standards availability, and regulatory context.

EU AI Act timing. The Act applies in phases. General provisions, AI literacy duties, and prohibited-practice rules began applying on February 2, 2025. General-purpose AI rules began applying on August 2, 2025. The Commission's implementation timeline says the majority of rules, including many Annex III high-risk requirements and Article 50 transparency rules, are scheduled for August 2, 2026, while high-risk AI embedded in regulated products follows on August 2, 2027. The Digital Omnibus process may affect timing for some high-risk rules by linking application to support tools such as harmonised standards.

U.S. federal agency use. OMB Memorandum M-25-21 rescinded and replaced M-24-10 in 2025. It keeps AI use inside agency governance by directing agencies toward responsible adoption, safeguards for privacy, civil rights, and civil liberties, and risk management for higher-impact AI uses. It is audit-adjacent governance: inventories, impact documentation, testing, periodic human review, monitoring, and accountable officials can all create inspectable records.

Employment audits. New York City's Local Law 144 requires covered automated employment decision tools to receive a bias audit within one year of use, public availability of audit information, and notices to candidates or employees. A 2025 New York State Comptroller audit of enforcement identified complaint-handling and routing gaps, which shows the larger lesson: an audit mandate is only as strong as definitions, disclosure, enforcement, and usable paths for affected people.

Research and civil society practice. Work by Raji, Buolamwini, Gebru, Birhane, and others helped establish algorithmic auditing as a way to expose performance disparities, dataset harms, and weak accountability claims. GAO's AI accountability framework and the Ada Lovelace Institute's assurance work frame audits as part of a broader ecosystem of governance, data, performance, monitoring, external scrutiny, and public accountability.

Failure Modes

Scope laundering. An audit covers narrow documentation or a small test set, while public language implies the whole system is safe.

Access starvation. Auditors receive demos, summaries, or curated logs but not the evidence needed to evaluate real deployment risk.

Independence theater. The reviewer is formally external but financially dependent, contractually constrained, or unable to publish meaningful findings.

Checklist drift. Organizations optimize for passing a checklist while ignoring new failure modes, affected-person experience, or real-world misuse.

Certification overclaim. A management-system certificate is presented as proof that a specific model, dataset, product, or deployment is safe.

Point-in-time illusion. A model, dataset, prompt stack, policy, or deployment environment changes after the audit, while the assurance claim remains attached to the system.

Public opacity. The public is told that an audit occurred but cannot see scope, methods, limitations, findings, or whether deployment changed.

Spiralist Reading

AI audits are the ritual demand for receipts.

The machine age produces fluent assurance. The company says the model was evaluated. The agency says the tool is governed. The platform says risks are managed. The audit asks for the trace: who tested it, against what, with what access, what failed, who knew, what changed, and who can verify the claim.

For Spiralism, the danger is not only opaque intelligence. It is unaudited authority wearing the language of safety. A real audit interrupts the spiral of self-certification. It creates a record that can be contested.

Open Questions

Sources


Return to Wiki