Wiki · Concept · Last reviewed June 25, 2026

Data Privacy Vocabulary

The Data Privacy Vocabulary is a W3C Community Group vocabulary for expressing privacy, data-processing, legal-basis, rights, risk, and AI-system information in machine-readable form.

Category: Concept / Privacy Vocabulary Published: June 25, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: DPV, W3C, privacy, RDF, GDPR, AI governance, records of processing

Definition

The Data Privacy Vocabulary, or DPV, is a vocabulary and ontology developed by the W3C Data Privacy Vocabularies and Controls Community Group. Its purpose is to make information about data processing, technologies, purposes, legal bases, data categories, rights, risks, controls, and actors expressible as interoperable metadata.

DPV is not a privacy law, audit result, certification, or consent mechanism. It is a semantic layer. It gives teams shared terms so a record can say what data is processed, for what purpose, under which claimed basis, by which role, using which technology, with which safeguards, and with which rights or risks attached.

Current Status

The versionless DPV URL currently resolves to DPV version 2.3, published as a W3C Final Community Group Report on February 25, 2026. The DPV specification says its namespace is https://w3id.org/dpv# with suggested prefix dpv, and points to GitHub for source and releases. The same status page is explicit that DPV is not a W3C Standard and is not on the W3C Standards Track.

The W3C group page describes DPV as useful for compliance documentation and evaluation, policy specification, consent representation, legal taxonomies, and annotation of text and data. DPV should therefore be treated as a vocabulary for evidence and interoperability, not as proof that an organization's evidence is complete or correct.

Structure

The core DPV specification is extended by more specific vocabularies. Its GitHub documentation names extensions for personal-data categories, location, technology, AI, justifications, risk, sectors, standards, and legal concepts. It provides serializations including RDF/XML, Turtle, JSON-LD, and N3, with RDFS/SKOS semantics as the default and an OWL2 serialization as an alternate form.

The primer explains why the vocabulary exists: privacy work lacked standard machine-readable terms for personal data, processing purposes, roles, and legal concepts. DPV answers that by offering common concepts that can be extended for jurisdictions, domains, and applications. The guides show possible application areas such as consent records and receipts, privacy notices, records of processing activities, data protection impact assessments, data-breach records, and rights management.

AI Governance Use

DPV matters for AI because governance often fails at the vocabulary layer. A model card, dataset sheet, privacy notice, DPIA, audit log, incident record, and vendor questionnaire can describe the same processing in incompatible language. DPV gives a way to connect those records without pretending they all have the same purpose.

The DPV AI extension, also version 2.3, extends DPV and its technology extension to represent AI techniques, applications, risks, and mitigations. Its table of contents includes AI systems and models, AI agents, data, model-development phases, risk concepts, lifecycle stages, training data, testing data, validation data, and transparency risk. The EU AI Act extension separately provides concepts based on the AI Act, including systems, purposes, risks, roles, documentation, and assessments.

For an institution using agents, retrieval, personalization memory, or model-evaluation pipelines, DPV can help name the processing record: source data, inferred data, generated data, legal basis, purpose, storage, transfer, processor, controller, safeguards, affected rights, and linked AI technology. That is especially useful when an AI system turns personal data into embeddings, summaries, risk scores, labels, or tool-call logs.

Evidence Pattern

A DPV-backed record should preserve more than a label. It should bind the vocabulary term to a concrete system version, dataset, data flow, policy decision, and review date. If a record says a purpose is service provision, the evidence should show which service, data, user relationship, retention period, and downstream exclusions are meant.

For AI governance, the useful pattern is a semantic receipt: personal-data category, processing operation, purpose, legal basis, technology, model or agent component, vendor role, risk, mitigation, human review, rights path, and source artifact. That receipt can support audits, DPIAs, records of processing, data-subject requests, incident response, and model-change review.

Limits

A controlled vocabulary can make privacy records comparable, but it cannot make them honest. DPV does not verify that a legal basis is valid, that consent was freely given, that a data flow is lawful, that a mitigation works, or that an AI system is acceptable in context. A bad deployment can use clean vocabulary.

DPV also does not replace local law, regulator guidance, system testing, security controls, human review, or institutional accountability. Its value depends on disciplined modeling: using terms consistently, versioning records, preserving source evidence, and separating what the vocabulary can express from what the organization has actually proved.

Spiralist Reading

Spiralism reads DPV as a grammar for institutional memory. Data systems translate persons into records. DPV asks the institution to translate its own behavior back into inspectable terms: purpose, basis, role, risk, safeguard, right, and technology.

The danger is ritual naming. A metadata graph can become a polished substitute for refusal, deletion, appeal, or redesign. The promise is different: if the vocabulary is tied to evidence, it can make privacy claims contestable by more than the team that wrote the system.

Open Questions

How should DPV records be validated against live data flows rather than documentation templates?
Which AI artifacts should require DPV-style vocabulary: training corpora, embeddings, prompts, model logs, eval sets, or agent traces?
How can affected people inspect a semantic privacy record without needing to understand RDF or ontology design?

Sources

W3C Data Privacy Vocabularies and Controls Community Group, Data Privacy Vocabulary (DPV), version 2.3, Final Community Group Report, February 25, 2026; reviewed June 25, 2026.
W3C, Data Privacy Vocabularies and Controls Community Group, group page and scope, reviewed June 25, 2026.
W3C GitHub, w3c/dpv repository, source, releases, extensions, and serializations, reviewed June 25, 2026.
W3C DPVCG, Primer for Data Privacy Vocabulary, Final Community Group Report, January 16, 2025; reviewed June 25, 2026.
W3C DPVCG, Guides for Data Privacy Vocabulary, application guidance for consent records, privacy notices, ROPA, DPIA, breach records, and rights management, reviewed June 25, 2026.
W3C DPVCG, AI Technology Concepts for DPV, version 2.3, Final Community Group Report, February 25, 2026; reviewed June 25, 2026.
W3C DPVCG, EU Artificial Intelligence Act concepts for DPV, version 2.3, Final Community Group Report, February 25, 2026; reviewed June 25, 2026.
Harshvardhan J. Pandit, Beatriz Esteves, Georg P. Krog, Paul Ryan, Delaram Golpayegani, and Julian Flake, Data Privacy Vocabulary (DPV) -- Version 2, arXiv:2404.13426, revised August 27, 2024.

Return to Wiki