Wiki · Concept · Last reviewed June 25, 2026

OpenTelemetry GenAI Semantic Conventions

OpenTelemetry GenAI semantic conventions are a shared vocabulary for describing generative-AI telemetry: spans, metrics, and events for model calls, agents, tools, retrieval, token use, evaluations, and provider-specific behavior.

Category: Concept Updated: June 25, 2026 Tags: AI agents, observability, OpenTelemetry, GenAI, tracing, audit trails

Definition

OpenTelemetry semantic conventions define common attributes that give meaning to telemetry when systems collect, produce, and consume it. The core OpenTelemetry documentation describes this as a common naming scheme for span names and kinds, metric instruments and units, attribute names, attribute types, meanings, and valid values. As of review on June 25, 2026, the public OpenTelemetry semantic-conventions page identified version 1.42.0 and stated that the Generative AI section had moved to a separate OpenTelemetry GenAI semantic-conventions repository.

The GenAI repository narrows that general discipline to AI applications. Its README says it covers semantic conventions for Generative AI, including spans, metrics, and events for GenAI clients, Model Context Protocol (MCP), and provider-specific conventions such as OpenAI. It also says the repository extends core OpenTelemetry semantic conventions, uses Weaver to manage dependencies, keeps human-readable documents under docs/, and stores generated YAML definitions under model/.

For AI Agent Observability, this matters because a model call, retrieval step, tool invocation, workflow, and evaluation should not be recorded as unrelated log strings. A convention gives those records comparable names.

How It Works

Instrumentation emits OpenTelemetry traces, spans, metrics, and events with gen_ai.* attributes. The moved OpenTelemetry registry page lists the shape of the vocabulary: agent description, ID, name, and version; conversation ID; data-source ID; embeddings dimension count; evaluation name, score, label, and explanation; input and output messages; operation name; prompt name; provider name; request settings; response model and finish reasons; retrieval documents and query text; system instructions; token counts; tool definitions; tool-call arguments and results; and workflow name.

That list is not a magic audit trail. It is a schema surface. gen_ai.operation.name can distinguish a chat operation from embeddings, retrieval, tool execution, agent invocation, or workflow invocation. gen_ai.request.model and gen_ai.response.model can separate the model requested from the model reported by the provider. gen_ai.usage.input_tokens and gen_ai.usage.output_tokens can support cost and load monitoring. Tool and retrieval fields can make the boundary between model text, tool arguments, tool results, and retrieved documents visible.

Agent Context

Agents turn telemetry from a debugging convenience into an accountability layer. A useful run record needs one thread through the user's request, prompt version, selected tools, retrieval sources, model calls, tool results, approvals, retries, failures, final answer, and side effects. GenAI semantic conventions help make those events machine-readable across libraries and vendors.

The convention is especially useful when agents call external tools or MCP servers. A trace can show that one span represented a model request, another represented tool selection, another represented tool execution, and another represented retrieval. That distinction helps an incident reviewer ask whether harm came from bad source material, malformed tool arguments, unsafe authorization, provider behavior, or a human approval gap.

The convention does not prove that an agent was correct, authorized, safe, or well governed. It only improves the grammar of evidence.

Governance and Safety

The same fields that make agent runs reviewable can also create a sensitive archive. The OpenTelemetry GenAI registry warns that input messages and output messages are likely to contain sensitive user or personal data. It also warns that retrieval query text, system instructions, tool-call arguments, and tool-call results may contain sensitive information. The tool-definitions note adds a second operational concern: definitions can be large, so optional properties should not be populated by default unless instrumentation deliberately enables them.

Governance should therefore treat GenAI telemetry as controlled evidence, not neutral exhaust. Production defaults should favor metadata, IDs, hashes, references, and redacted samples over raw prompts and full tool payloads. Debug traces, legal holds, security investigations, and user-facing receipts may need different retention windows and access controls.

Organizations should also version the schema they use. A page that says "we log OpenTelemetry" is too vague. A serious record should name the semantic-convention version or repository commit, instrumentation library, sampling rule, redaction rule, retention class, and source system.

Defense Pattern

Pin the convention. Record the OpenTelemetry semantic-convention version or GenAI repository commit used by the instrumentation.
Trace the whole run. Keep one run identifier across prompts, model calls, retrieval, tools, approvals, and side effects.
Preserve boundaries. Distinguish user input, system instructions, retrieved documents, model output, tool arguments, tool results, and human approvals.
Redact by default. Treat messages, system instructions, retrieval queries, and tool payloads as sensitive unless a documented purpose requires capture.
Separate evidence tiers. Product metrics, developer debugging, security monitoring, legal evidence, and user receipts should not share one retention rule.
Test reconstructability. Sample real incidents or drills and verify that reviewers can reconstruct the run without exposing unnecessary private content.

Source Discipline

Use the terms precisely. OpenTelemetry core semantic conventions are the general telemetry vocabulary. The OpenTelemetry GenAI semantic-conventions repository is the AI-specific extension. A vendor dashboard, an OpenInference trace, an audit log, and a legal event log may use related ideas, but they are not the same artifact.

Also distinguish moved attributes from current source of truth. The official OpenTelemetry registry page still exposes useful attribute names and warnings, but it marks the GenAI attributes as moved to the separate repository. A current citation should point readers to that repository as well as the moved registry page.

Spiralist Reading

Spiralism reads semantic conventions as grammar for machine action. Without grammar, an agent run becomes a pile of messages, timings, and vendor-specific blobs. With grammar, a reviewer can ask sharper questions: what was instruction, what was evidence, what was a tool, what was a side effect, and what was only model output?

The convention does not make the machine truthful. It makes parts of the record comparable enough for responsibility to find a foothold.

Open Questions

Which gen_ai.* fields should be mandatory for high-risk agent runs?
How much raw prompt, retrieved content, and tool payload should production systems record by default?
How should proprietary provider APIs map their fields into shared OpenTelemetry conventions without losing forensic detail?
Can user-facing run receipts be generated from the same telemetry without exposing secrets or unrelated personal data?

Sources

OpenTelemetry, OpenTelemetry semantic conventions 1.42.0, reviewed June 25, 2026.
OpenTelemetry, OpenTelemetry GenAI semantic conventions repository, reviewed June 25, 2026.
OpenTelemetry, GenAI semantic convention attributes, moved-attribute registry reviewed June 25, 2026.
OpenTelemetry, Moved: Generative AI semantic conventions, reviewed June 25, 2026.

Return to Wiki