Structured Outputs and Constrained Decoding
Structured outputs are model responses designed or enforced to satisfy a machine-readable contract, such as JSON Schema, a tool-call argument schema, a database query object, or a domain-specific grammar. Constrained decoding is the inference-time technique of restricting which tokens a model may emit so the final response follows a declared schema or grammar.
Snapshot
- Structured output is the interface contract: the response should fit a machine-readable shape such as JSON Schema, a tool-call argument object, a grammar, or a typed record.
- Constrained decoding is one implementation method: the serving system blocks tokens that would violate the active schema or grammar as generation proceeds.
- Common production forms include JSON Schema response formats, strict function-call schemas, grammar-guided generation, and structured fields for evaluation logs or agent handoffs.
- Reliability boundary: a system may guarantee supported syntax or schema adherence, but not factual truth, authorization, policy correctness, or safe downstream action.
- Portability warning: "JSON Schema support" usually means a provider-specific subset, not automatic support for every Draft 2020-12 keyword or validation behavior.
- Governance question: what happens after a valid object enters a workflow, triggers a tool, updates a record, or becomes evidence in an institutional decision?
Definition
Structured-output systems ask a language model to return data that software can parse against an explicit contract. The target may be a JSON object, a list of extracted fields, a typed API call, a user-interface tree, a SQL-like query object, a robot command, or any other representation with explicit syntax.
Constrained decoding, also called constrained sampling or guided generation, enforces part of that contract during generation. Instead of allowing the model to choose any next token from the vocabulary, the decoder masks out tokens that would make the output invalid under the active schema, regular expression, finite-state machine, or context-free grammar.
The distinction matters. A model can be trained, prompted, retried, or post-validated to prefer a format, but constrained decoding changes the runtime search space itself. It can make invalid syntax unreachable for supported constraints, yet it cannot by itself guarantee truth, safety, correct business logic, authorization, or meaningful reasoning.
Not every structured-output feature is pure constrained decoding. Commercial and open-source systems often combine fine-tuning, schema-aware prompting, strict validators, retries, refusal paths, parser generation, and runtime token masks. The practical question is therefore not only whether the object parses, but which layer enforced which part of the contract.
A structured output is still untrusted model output. It should be treated as a candidate record, tool request, extraction, or classification until the receiving system validates the values, checks authority, and decides what consequences may follow.
The schema itself is also part of the model-facing interface. Field names, descriptions, examples, enums, constants, regular expressions, and tool descriptions can steer model behavior. In safety-critical systems, schemas should be reviewed like prompts and API contracts, not treated as neutral plumbing.
Contract Boundary
The contract has at least four layers that should not be collapsed. The first is the formal schema or grammar: for JSON, the public JSON Schema specification identifies Draft 2020-12 as the current version and separates Core from Validation. The second is the provider or library subset actually supported during generation. The third is the application validator used after generation. The fourth is the business, legal, or safety rule that decides what may happen with the validated object.
Provider subsets are normal. OpenAI documents supported schemas and strict-mode requirements. Anthropic documents standard JSON Schema support with limitations, SDK transformations for unsupported constraints, grammar compilation and caching, and separate JSON-output and strict-tool-use modes. Google documents Gemini structured output as supporting a subset of JSON Schema. vLLM documents choice, regex, JSON Schema, context-free grammar, and structural-tag modes, with backend-dependent behavior.
A valid object therefore answers only the narrow question "does this output fit the supported contract as enforced in this run?" It does not answer whether the extracted date is real, whether a cited source supports the claim, whether a tool call is authorized, whether a person should be classified a certain way, or whether the receiving system may lawfully act on the data.
How It Works
Schema or grammar declaration. The application declares the allowed shape of the response. In contemporary LLM applications this is often JSON Schema, because JSON already sits at the boundary between web services, SDKs, databases, and typed application code.
State tracking. During generation, the system tracks where the partial output is inside the schema or grammar. At one step the only valid token may be an opening brace, at another a quoted property name, at another a comma, enum value, number, string, or closing bracket.
Token masking. The decoder computes which tokens are legal next tokens and blocks illegal ones before sampling. This can be implemented with finite-state machines, context-free grammars, pushdown automata, vocabulary indexes, or specialized serving-engine integrations.
Compilation and caching. Some systems compile schemas into grammar artifacts or other decoder state before generation. That can create first-request latency, cache behavior, privacy limits for schema content, and versioning concerns when a schema changes.
Parsing and validation. After generation, applications still need ordinary parsers and validators. Schema-conforming text must be converted into runtime objects, checked against semantic rules, and handled safely if the model refuses, times out, truncates, or returns a valid object with bad content.
Failure representation. A production schema should say how uncertainty, abstention, refusal, missing evidence, and partial completion are represented. If the only allowed shape is a confident-looking answer, constrained decoding can force uncertainty into misleading fields.
Why It Matters
Structured outputs are one of the quiet bridges between chatbots and operational software. Free-form text is flexible for humans but awkward for programs. A valid JSON object can be routed, stored, audited, displayed, transformed into a typed object, used as a tool call, or passed into another system.
OpenAI's Structured Outputs documentation made this pattern highly visible in commercial APIs by adding stricter JSON Schema adherence for function calls and response formats. OpenAI distinguishes the feature from earlier JSON mode: JSON mode improves valid JSON, while Structured Outputs aim at schema adherence for supported schemas.
By 2026, the pattern had spread beyond one provider. Anthropic documents JSON-schema outputs and strict tool use for Claude. Google documents structured output for the Gemini API with a subset of JSON Schema. Open-source serving stacks such as vLLM expose guided decoding for choices, regular expressions, JSON Schema, and grammars. The shared movement is practical: make the output boundary more like an API contract and less like a wish embedded in a prompt.
Current Context
As of June 25, 2026, OpenAI documents Structured Outputs in two main API patterns: function calling, where schemas describe tool arguments, and JSON Schema response formats, where schemas describe the assistant's response. Its docs distinguish Structured Outputs from JSON mode: JSON mode aims at valid JSON, while Structured Outputs aim at adherence to a supplied schema where supported.
OpenAI's current function-calling docs make strictness operational rather than rhetorical. Strict schemas require every object to set additionalProperties to false and mark all fields as required, with nullable types used for optional fields. If strict: true is requested with an incompatible schema, the request is rejected. If strictness is omitted, Responses requests may try to normalize the schema into strict mode and fall back to non-strict, best-effort function calling when they cannot; Chat Completions remains non-strict by default. That means teams should log the effective strictness returned by the API, not only the flag they intended to send.
Provider support is converging on the same shape but not on identical semantics. Anthropic describes JSON outputs and strict tool use as grammar-constrained features with schema limitations, invalid-output handling, compiled grammar caching, prompt-cache effects, and data-retention cautions for schema content. Google describes Gemini structured output as JSON generation with a supported subset of JSON Schema. vLLM documents structured-output modes for choices, regular expressions, JSON Schema, context-free grammars, and structural tags. A portable application therefore needs an explicit compatibility layer rather than assuming that "JSON Schema support" means the full JSON Schema specification.
Research and open-source systems frame the issue as runtime infrastructure. Outlines uses finite-state-machine style guidance for regular expressions and grammars, SGLang includes structured-output decoding in a broader language-model-program runtime, and XGrammar focuses on efficient context-free-grammar execution. These sources support the narrow claim that constrained generation is a serving-layer discipline; they do not prove that generated fields are true or safe.
Applications
Tool calls and agents. Function calling depends on structured arguments. If an agent is going to call a calendar API, payment function, database lookup, code tool, or robot controller, the receiving system needs fields it can parse and validate.
Information extraction. Structured outputs are used to extract names, dates, citations, medical fields, legal clauses, addresses, product attributes, evidence labels, or action items from unstructured text and documents.
Interface generation. A model can emit a tree of UI components, form fields, workflow steps, or configuration objects that a frontend or automation system renders later.
Evaluation and logging. Benchmarks, red-team systems, and monitoring pipelines often need model judgments in stable fields: score, rationale, category, severity, evidence span, refusal reason, or recommended escalation.
Program and query generation. Constraints can help keep generated code, JSON, XML, regular-expression-shaped text, or domain-specific commands syntactically valid enough for downstream checking.
Safe Receiver Pattern
The safer architecture treats constrained decoding as the first gate, not the last one. The receiving system should parse the object, validate it with an independent validator, run semantic checks, verify cited sources, authorize the requested action, and record the schema, model, tool, and validator versions before execution.
For tool calls, the receiver should map each generated field to an allowed capability instead of passing model-provided values directly into APIs. File paths, URLs, SQL fragments, account numbers, shell arguments, destination addresses, and external identifiers should be allow-listed, resolved server-side, or rejected when they fall outside the user's authority.
For extraction and classification, schemas should include uncertainty, evidence spans, source identifiers, and abstention or refusal fields when the workflow depends on factual correctness. A perfectly valid object with no evidence path should not become a business record, legal assertion, medical claim, or public-service decision.
For provider SDKs that simplify schemas before sending them to the model, the receiver should validate against the application's full schema after generation. A constraint moved into a field description may help the model, but it is not the same as a deterministic runtime check.
Limits and Failure Modes
Syntax is not semantics. A valid object can still contain the wrong date, wrong person, wrong citation, unsafe command, biased classification, or unsupported inference. Structured output can make bad information easier to pipe into real systems.
Supported-schema gaps. Providers and libraries often support only subsets of JSON Schema or grammar features. Recursive structures, references, unions, numeric bounds, regex constraints, and large schemas may behave differently across systems.
Strict-mode mismatch. A schema that works in one API, model, library, or server version may fall back to best effort, reject the request, ignore a constraint, or need a stricter subset elsewhere. Teams should log the effective strict or non-strict status, schema-processing errors, and fallback behavior, not only the requested setting.
Constraint laundering. Some SDKs or providers may remove unsupported constraints and express them as natural-language descriptions while preserving validation in client code. That can be useful, but it means the model-facing contract and the application validator are no longer identical.
Schema drift. The JSON Schema, application types, database schema, SDK helper classes, and validation rules can diverge. Drift is especially dangerous when a generated object is later treated as a typed fact instead of untrusted model output.
Enum pressure. If every output must choose one enum label, the model may map ambiguous evidence into the least-wrong category. High-stakes schemas need "unknown," "insufficient evidence," or escalation states when forced classification would be misleading.
Quality tradeoffs. Token masking can change the model's probability distribution. In some cases the model may produce valid but lower-quality content, overfit to enum labels, or choose a syntactically legal path that avoids the harder answer.
Latency and serving cost. Complex grammars can add overhead because the decoder must compute valid-token masks at each step. Systems such as SGLang and XGrammar are important partly because they try to make structured generation fast enough for production workloads.
Refusal handling. Safety systems may require the model to refuse a request rather than satisfy the requested schema. Applications must represent refusal, truncation, and incomplete output explicitly instead of assuming every call returns usable data.
Schema privacy. Schemas can reveal field names, enums, business processes, internal policy categories, or sensitive domain concepts. Because providers may compile or cache schema artifacts differently from prompts and responses, teams should keep personal data, secrets, and regulated facts out of schema names, descriptions, enum values, and regular expressions.
Action laundering. A valid tool-call object can make a model decision look like ordinary software input. The receiver still needs authorization, policy checks, rate limits, user confirmation, and rollback paths before carrying out side effects.
Validation complacency. Developers may treat schema adherence as correctness. The safer rule is that constrained decoding is one layer in a pipeline that also needs type validation, authorization, policy checks, business logic, audit logging, and human review where stakes are high.
Infrastructure
Structured outputs sit at the intersection of model APIs, inference engines, schema standards, parsers, SDKs, and application frameworks. JSON Schema provides a shared language for object shape and validation; its current specification version is Draft 2020-12, split across Core and Validation. Constrained-generation libraries then translate some subset of that language into runtime decoding constraints.
Production systems should treat supported schema subsets as product behavior, not as identical to the full JSON Schema specification. The effective contract includes the schema dialect, provider or library version, model family, strictness flag, parser behavior, retry policy, schema-caching behavior, and what happens when a request refuses, times out, or exceeds length limits.
Research systems show several implementation paths. The Outlines paper reformulated neural text generation as transitions through finite-state-machine states and used a vocabulary index to guide generation with regular expressions and context-free grammars. SGLang combined a frontend language for structured model programs with a runtime that includes compressed finite-state machines for structured decoding. XGrammar focused on efficient context-free-grammar execution and reported large speedups by precomputing context-independent token checks and co-designing grammar execution with inference engines.
Benchmarks are beginning to measure the layer directly. JSONSchemaBench evaluates constrained decoding systems for coverage, efficiency, and output quality across real-world JSON schemas and the official JSON Schema test suite. That matters because production reliability depends not only on whether a model can follow one demo schema, but on whether the serving stack can handle messy schemas at scale.
Minimum Record
A governed structured-output system should preserve enough evidence to reconstruct both the contract and the consequence of a generated object.
- Schema identity: schema name, version, dialect, provider subset, hash, owner, approval date, and change history.
- Generation context: model, API surface, provider or serving engine, strictness setting, decoding backend, schema-cache status where available, temperature or sampling settings, and refusal or truncation path.
- Validation context: parser, validator library, full application schema, semantic checks, source-verification rules, and what happens when provider and application validation disagree.
- Authority context: tool permissions, user or institutional authorization, human-review trigger, policy gate, rate limit, idempotency rule, and rollback path.
- Evidence context: source documents, retrieved spans, citations, confidence or uncertainty fields, abstentions, model-visible tool outputs, and final object stored or acted on.
- Incident context: rejected outputs, invalid objects, schema fallback, blocked tool calls, downstream errors, user disputes, and links to AI Audit Trails or AI Incident Reporting.
Governance Requirements
Applications that use structured outputs should document the schema, schema dialect, model, serving engine, strictness setting, refusal path, validation rules, retries, schema cache behavior, and downstream side effects. A generated object should be traceable to the prompt, schema version, model version, validator version, and tool or workflow that consumed it.
High-stakes deployments should separate syntactic validation from authorization and semantic validation. For example, a valid payment instruction still needs account permissions, fraud checks, user confirmation, idempotency controls, and audit records. A valid medical extraction still needs clinician review. A valid legal citation still needs source verification. Where a structured-output system is part of a high-risk AI system, formal record-keeping and human-oversight duties may also apply.
Security review should treat structured outputs as an output-handling and excessive-agency surface. A valid JSON object can still carry malicious strings, unsafe file paths, SQL fragments, overbroad API parameters, unauthorized account identifiers, or prompt-injection residue copied from a source document. The receiving code must validate, encode, authorize, and constrain effects before execution.
Organizations should put structured-output schemas in the AI system inventory and the AI procurement or assurance record when they mediate consequential actions. Incident review should preserve the prompt, schema, generated object, validator result, tool authorization decision, and downstream action so failures can be reconstructed rather than explained from a screenshot.
Teams should test adversarial and edge cases: ambiguous inputs, missing fields, malicious instructions embedded in source text, long strings, enum pressure, schema evolution, unsupported constraints, contradictory evidence, schema-complexity limits, refusal-triggering requests, and tool-call arguments that are syntactically valid but operationally unsafe.
Evaluations should report schema adherence separately from field accuracy, source support, abstention behavior, policy compliance, and downstream task success. A system can be excellent at producing valid JSON and still poor at extracting evidence, classifying people fairly, following policy, or deciding whether an action should be taken.
Privacy and security review should also cover schema content. Provider documentation may treat schema artifacts differently from prompts and responses for caching or retention. Sensitive facts, protected health information, secrets, customer names, internal policy labels, or exploit patterns should not be placed in schema property names, descriptions, enum values, constants, or regular expressions unless the retention and access path has been reviewed.
Source Discipline
Use provider documentation for provider-specific behavior, official specifications for schema semantics, and papers or benchmark repos for claims about algorithms and performance. A blog post or API guide may describe a product guarantee, but that guarantee is bounded by the models, APIs, schema subset, strictness defaults, refusal behavior, caching rules, and dates named in the source.
Do not cite "guaranteed schema adherence" as proof of truth, safety, fairness, or institutional legitimacy. The source-supported claim is narrower: under documented conditions, the output is expected to match a supported structure. Truth checking, authorization, source provenance, privacy review, and accountability remain separate layers.
Spiralist Reading
Structured outputs are the moment the oracle becomes a clerk.
A free-form answer persuades. A structured object moves. It enters the workflow, trips the condition, fills the database, calls the tool, updates the record, and leaves a trace that looks more official because it is parseable.
For Spiralism, this is a power transition. The machine is no longer only speaking in human language; it is speaking in institutional forms. The schema becomes a gate through which synthetic judgment enters software, bureaucracy, commerce, medicine, law, logistics, and public administration.
The healthy version is narrow, inspectable, validated, and reversible. The unhealthy version is schema-shaped authority: a fluent model emits a perfectly valid object, and the institution mistakes parseability for truth.
Open Questions
- How should developers measure the semantic correctness of structured outputs, not only schema validity?
- Which schema features should production providers support, and how should unsupported constraints fail visibly?
- Can constrained decoding preserve model quality across complex schemas, or does it sometimes steer models toward shallow but valid answers?
- How should refusal, uncertainty, and insufficient evidence be represented in schemas without hiding them as ordinary fields?
- What audit trail is required when a structured output triggers side effects in financial, medical, legal, or public-sector systems?
Related Pages
- Tool Use and Function Calling
- AI Agent Sandboxing
- AI Agent Observability
- AI Agents
- AI Governance
- AI System Inventory
- AI Procurement
- AI Change Management
- AI Post-Market Monitoring
- System Prompts
- Model Context Protocol
- AI Coding Agents
- AI Evaluations
- AI Audit Trails
- AI Incident Reporting
- LLM-as-a-Judge
- Prompt Injection
- Context Poisoning
- Retrieval-Augmented Generation
- AI Hallucinations
- Secure AI System Development
- AI Control
- AI Audits and Assurance
- AI Red Teaming
- Model Cards and System Cards
- LLM Serving and KV Cache
- vLLM
- AI Compiler Stacks
- OpenAI
- Anthropic
- Human Oversight in AI
- AI Liability and Accountability
- Agent Tool Permission Protocol
- Claim Hygiene Protocol
Sources
- OpenAI API Docs, Structured model outputs, reviewed June 25, 2026.
- OpenAI API Docs, Function calling strict mode, reviewed June 25, 2026.
- Anthropic Docs, Structured outputs, reviewed June 25, 2026.
- Google AI for Developers, Structured outputs, reviewed June 25, 2026.
- vLLM Documentation, Structured Outputs, reviewed June 25, 2026.
- JSON Schema, Specification, reviewed June 25, 2026.
- Brandon T. Willard and Rémi Louf, Efficient Guided Generation for Large Language Models, arXiv, 2023.
- Lianmin Zheng et al., SGLang: Efficient Execution of Structured Language Model Programs, arXiv, 2023; revised 2024.
- Yixin Dong et al., XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models, arXiv, 2024; revised 2025.
- Saibo Geng et al., JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models, arXiv, 2025.
- Yang Xie et al., "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output, arXiv, 2024.
- NIST, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, July 2024; updated April 8, 2026; reviewed June 25, 2026.
- OWASP Gen AI Security Project, 2025 Top 10 Risk & Mitigations for LLMs and Gen AI Apps, reviewed June 25, 2026.
- European Commission AI Act Service Desk, Article 12: Record-keeping and Article 14: Human oversight, Regulation (EU) 2024/1689, reviewed June 25, 2026.