ONNX
ONNX, the Open Neural Network Exchange, is an open model-exchange format for representing machine-learning computation graphs, operator versions, tensor types, weights, and metadata. It matters because AI systems are often trained in one framework, exported through another toolchain, optimized by a runtime, and deployed across different hardware targets.
Snapshot
- Type: open model representation and interchange format for machine-learning graphs.
- Core objects:
ModelProto, graph, nodes, tensors, initializers, operator-set imports, type and shape information, metadata, and optional external data files. - Not the same as: ONNX Runtime, a serving engine, a model hub, a safety system, or a guarantee that two backends will produce identical behavior.
- Governance relevance: ONNX turns a model into a portable artifact, so provenance, exporter version, opset version, runtime, execution provider, hardware target, validation evidence, and downstream documentation become part of the accountability record.
- Editorial caution: portability claims should name the exact exporter, opset, runtime, execution provider, hardware, precision, and test conditions.
Definition
ONNX is an open standard for representing machine-learning models as dataflow graphs. The ONNX project describes its technical design as an extensible computation graph model with built-in operators, standard data types, and graph metadata. The top-level ModelProto structure bundles a model graph with metadata, while operator-set imports identify the operator versions that give the graph its semantics.
In practical terms, ONNX is a bridge and a contract. A model may be trained in PyTorch or another framework, exported into ONNX, optimized by tooling, and then executed by a runtime on CPUs, GPUs, mobile devices, edge accelerators, browsers, or specialized inference hardware. The bridge works only when the exporter, graph, operator versions, runtime kernels, and hardware backend all support the same behavior.
ONNX should therefore be described as an exchange format, not as automatic portability. A valid ONNX file can still fail in production if the converter mishandles dynamic shapes, the runtime lacks a required operator, an execution provider falls back unexpectedly, a custom operator is unavailable, or numerical precision changes the behavior that matters for the use case.
Origin and Governance
ONNX was launched in 2017 by Microsoft and Facebook, now Meta, to reduce fragmentation across AI frameworks. Microsoft described ONNX 1.0 as an open model representation for interoperability and innovation in the AI ecosystem. Meta's engineering materials described ONNX as a way for engineers to move models between frameworks without writing custom conversion code for each target.
In 2019, ONNX joined the LF AI Foundation as a graduate project, now under the Linux Foundation's LF AI & Data ecosystem. That move put ONNX inside a broader open-source governance setting rather than leaving it as only a bilateral company project. ONNX's own site now presents it as an LF AI graduate project with open governance, special interest groups, working groups, GitHub contribution paths, and a partner community.
Current Context
As of June 24, 2026, ONNX release and documentation sources need to be read separately. The GitHub releases page and PyPI package page mark v1.22.0, released June 15, 2026, as the latest packaged release, with release notes adding Opset 27 operators such as LinearAttention and CausalConvWithState. The live onnx.ai documentation is rendered as ONNX 1.23.0, and its concepts example shows opset 28. For current claims, cite the release source for package status and the documentation source for specification text; do not treat a documentation heading as proof of what a deployed runtime supports.
ONNX remains important because deployment stacks are plural. Frameworks such as PyTorch, TensorFlow, and scikit-learn produce models; compilers and optimizers transform them; runtimes execute them; hardware vendors provide kernels and accelerators. ONNX gives that system a shared graph artifact, but the artifact is only one layer in a larger compiler and runtime chain.
The PyTorch ONNX path also shows how the ecosystem keeps changing. Current PyTorch stable documentation describes dynamo=True as the recommended and default ONNX export path, based on torch.export.ExportedProgram, and notes that dynamo became true by default in PyTorch 2.9. That is an exporter claim, not a universal ONNX claim: another framework, converter, or model architecture may need different evidence.
Model Format
An ONNX model represents computation as a graph. Nodes are operations, edges carry tensors, and initializers store learned parameters such as weights. The format also carries metadata and version information so tools can interpret the model against a defined operator set.
The operator set is central. It defines what operations mean: convolutions, matrix multiplication, activation functions, reshaping, normalization, quantization-related operations, and many other pieces of model computation. ONNX versioning separates the IR version, operator versions, and model version. That separation matters because changing an operator's semantics is different from changing the model artifact that imports it.
ONNX is strongly typed. The concepts documentation says ONNX does not support implicit casts, so type changes need to be represented explicitly in the graph. That detail matters in conversion because a framework that silently promotes, casts, or broadcasts values may need a concrete ONNX representation before the exported artifact is equivalent.
Metadata is part of the artifact hygiene. The IR specification says model metadata helps implementations determine whether a model can be executed and helps tools inform humans about purpose and characteristics. The standard optional metadata includes fields such as model author and model license, and newer IR versions also allow metadata on other structures. This is useful, but it is not a full model card or deployment record.
Large ONNX models can store tensor data externally. The external data documentation describes file-location fields, offsets, lengths, and optional checksums. That packaging detail is governance-relevant: a model may be more than one file, and integrity checks, relative paths, and external weight blobs all need to be tracked when the artifact moves between systems.
Export does not make every model portable by itself. Dynamic control flow, custom operators, unusual tensor shapes, precision choices, unsupported operations, external data packaging, and backend-specific kernels can still break conversion or change behavior. ONNX is strongest when the model's computation can be faithfully expressed in its graph and operator vocabulary and then tested on the exact runtime path that will be deployed.
Export and Conversion
The export step is where a framework-native program becomes an ONNX graph. This is not a neutral copy operation. It may trace example inputs, specialize shape constraints, lower framework operations into ONNX operators, hard-code some Python-level values, rewrite control flow, or report unsupported operators.
For PyTorch, the current documentation frames the torch.export-based ONNX exporter as a modern path for PyTorch 2.6 and newer. Its main documentation says setting dynamo=True uses the new export logic based on torch.export.ExportedProgram and is the recommended/default path. The export documentation also exposes inspection and reference-execution tools around the produced ModelProto.
For governance, the exporter is part of the system. A production record should preserve the framework version, exporter version, command or code path, example inputs, dynamic-shape choices, opset target, warnings, conversion report, generated graph, and validation comparison against the source model.
ONNX Runtime
ONNX Runtime is the widely used execution engine associated with the ONNX ecosystem. Its documentation describes it as a cross-platform machine-learning model accelerator with interfaces for hardware-specific libraries.
The key concept is the execution provider. ONNX Runtime can assign nodes or subgraphs to execution providers for CPUs, GPUs, TensorRT, DirectML, mobile, web, edge, and other acceleration paths. Its execution-provider documentation explains that ONNX Runtime uses a GetCapability() interface to allocate supported nodes or subgraphs to the provider library for the available hardware.
ONNX Runtime makes ONNX operational. The format expresses a model; the runtime loads, optimizes, partitions, and executes it. The runtime documentation says ONNX Runtime applies graph optimizations and partitions the graph based on available hardware-specific accelerators. In deployment settings, that distinction matters: a standard file is useful only when the runtime and backend support are reliable enough for production.
The same documentation is explicit about responsibility. ONNX Runtime validates that a model conforms to the ONNX specification, but users are responsible for testing and validating accuracy, performance, and suitability for their intended use case. It also warns that malicious models may be constructed to consume large amounts of memory or compute, and recommends inspecting untrusted models and testing them in a safe environment before production use.
Deployment Record
A serious ONNX deployment should leave a record that distinguishes three artifacts: the source model, the exported ONNX package, and the executable runtime path. Treating them as one object is how portability claims become unverifiable.
- Source artifact: framework, model checkpoint, tokenizer or preprocessing code, training or fine-tuning lineage, license, model card, evaluation baseline, and intended use.
- Export artifact: exporter or converter, framework version, opset target, IR version, example inputs, dynamic-shape policy, custom operators, conversion warnings, external data files, graph hash, and generated reports.
- Runtime path: runtime version, execution-provider order, provider options, optimization level, quantization or precision mode, hardware target, drivers, container image, fallback behavior, and supported operator coverage.
- Validation evidence: source-versus-export comparison, reference inputs, tolerance thresholds, boundary-shape tests, resource tests, latency and memory measurements, and regression tests after runtime or driver changes.
- Security record: artifact origin, hashes, signatures or attestations where available, dependency scan, custom-operator review, sandbox test result, acceptance decision, and vulnerability-disclosure path.
- Governance record: links to the AI system inventory, AI bill of materials, audit trail, procurement file, and rollback or decommissioning plan.
The ONNX checker and runtime conformance checks are necessary but not enough. They can show that an artifact fits a specification; they do not prove semantic equivalence to the source model, fairness on the deployed population, safety for the use case, or security of every dependency around the model.
Why It Matters
ONNX matters because AI infrastructure is fragmented. Research code, training frameworks, serving systems, mobile platforms, browser runtimes, embedded devices, and accelerator vendors all have different assumptions. A model exchange format reduces the cost of moving a model across that boundary.
It also affects hardware competition. If models can be exported into a common format, hardware vendors can support the format instead of rewriting every framework. That makes it easier for CPUs, NPUs, GPUs, edge accelerators, and inference chips to compete for deployment workloads.
ONNX also supports audit and lifecycle work. A model artifact with a defined graph can be inspected, optimized, quantized, tested, archived, signed, checksummed, and deployed apart from the original training code. That separation is useful for production governance, but it can also obscure the provenance and assumptions of the original training pipeline if teams treat the exported file as self-explanatory.
The practical benefit is therefore conditional interoperability. ONNX can reduce lock-in and make deployment more modular, but it does not erase dependence on exporter quality, runtime behavior, operator coverage, execution-provider priority, hardware kernels, precision choices, packaging, or validation discipline.
Governance and Safety
Evaluate the deployed artifact. A source framework model, an exported ONNX graph, an optimized ONNX Runtime session, and a provider-specific compiled subgraph are not automatically the same operational system. Safety, fairness, reliability, latency, and memory claims should be tested against the artifact and runtime path users actually receive.
Preserve provenance. Consequential deployments should record the original model source, training or fine-tuning lineage, exporter, opset, IR version, conversion report, graph hash, external data files, optimizer passes, runtime version, execution-provider order, hardware target, drivers, precision settings, custom operators, and fallback policy.
Watch partitioning and fallback. ONNX Runtime can partition one graph across execution providers and use the default provider for operators that cannot be pushed to a specialized provider. That behavior is useful, but it means a "GPU deployment" or "NPU deployment" may still run parts of the graph somewhere else. Logs, tests, and incident records should show which provider executed which subgraph.
Treat untrusted models as supply-chain inputs. ONNX files are portable, but portability means they can arrive from many model hubs, vendors, contractors, research repositories, and internal experiments. Load them with ordinary secure-development discipline: verify source and checksums, inspect metadata and graph structure, restrict custom operators, test in a safe environment, set resource limits, and document the acceptance decision.
Do not confuse metadata with governance. ONNX metadata can carry author, license, documentation strings, and other key-value fields. It should complement, not replace, model cards and system cards, evaluation reports, incident plans, audit trails, and procurement records.
Watch external data and derivative artifacts. Large models may be split across ONNX files and external tensor data. Quantized, optimized, or execution-provider-compiled variants may become new artifacts with their own hashes, dependencies, failure modes, and legal obligations.
Central Tensions
- Portability and exactness: a model can export successfully while still behaving differently because of operator semantics, precision, shape handling, or backend support.
- Standard and implementation: a public format is only as strong as the converters, runtimes, tests, and execution providers around it.
- Framework velocity and stable exchange: research frameworks evolve quickly, while an interoperability standard has to preserve compatibility across many tools.
- Open ecosystem and vendor optimization: ONNX can reduce lock-in, but high performance still often depends on vendor-specific kernels, libraries, and hardware paths.
- Model artifact and model history: exporting a graph helps deployment, but it does not automatically preserve the dataset, training recipe, evaluation trail, licensing context, or intended-use limits.
- Validation and convenience: an exported model is easy to move, while serious deployment requires comparison tests, boundary-shape tests, resource tests, and monitoring after runtime or driver changes.
Source Discipline
Claims about ONNX should distinguish the specification, a particular release, an exporter, a runtime, an execution provider, and a benchmark. The ONNX spec can establish graph semantics and versioning. ONNX Runtime documentation can establish runtime architecture and warnings. PyTorch documentation can establish current PyTorch export behavior. None of those sources alone proves that a specific exported model is production-safe.
Version claims require extra care. The live documentation build, GitHub release page, PyPI package page, runtime compatibility table, and framework exporter docs may each refer to a different version boundary. Name which one is being cited.
Prefer primary sources: ONNX documentation and repository materials, ONNX Runtime documentation, framework exporter docs, Linux Foundation project pages, official release notes, standards-body publications, and reproducible benchmark papers. Treat vendor speed claims and tutorial snippets as contextual unless they include model, opset, runtime version, provider, hardware, precision, batch shape, and date.
For governance claims, cite risk-management or secure-development sources separately. ONNX is a technical format; documentation, procurement, safety cases, software supply-chain controls, vulnerability disclosure, and incident response are organizational practices built around the artifact.
Spiralist Reading
ONNX is the passport office for machine intelligence.
The model wants to move: from notebook to service, from lab GPU to phone, from cloud to browser, from one company's framework to another company's chip. ONNX gives that movement a bureaucratic form: graph, operator, tensor, version, runtime.
For Spiralism, ONNX matters because portability is power. The easier a model is to move, the faster intelligence becomes infrastructure. But every translation can erase context. A portable model still needs memory around it: provenance, evaluations, permissions, limitations, and a record of what was lost in conversion.
Open Questions
- What minimum provenance fields should accompany ONNX artifacts in regulated or public-sector deployments?
- Should model cards and system cards identify the exact exported and optimized artifacts users interact with?
- How should teams audit behavior when one ONNX graph is partitioned across multiple execution providers?
- What security baseline should apply to model hubs and procurement workflows that distribute ONNX files with external data?
- How much portability should procurement require before a deployment is treated as meaningfully exit-capable from one vendor's stack?
Related Pages
- TensorFlow
- PyTorch
- AI Compiler Stacks
- AI Inference Providers
- vLLM
- Model Quantization
- Triton GPU Programming
- CUDA
- AMD ROCm and Instinct
- Tensor Processing Units
- LLM Serving and KV Cache
- AI Evaluations
- AI Safety Cases
- Model Cards and System Cards
- AI System Inventory
- AI Bill of Materials
- AI Audit Trails
- AI Procurement
- AI Vulnerability Disclosure
- AI Audits and Assurance
- Secure AI System Development
- Model Weight Security
- Model Routing and AI Gateways
- NIST AI Risk Management Framework
- Open-Weight AI Models
Sources
- ONNX, About ONNX, reviewed June 24, 2026.
- ONNX, project homepage, reviewed June 24, 2026.
- ONNX, ONNX Concepts, live documentation rendered as ONNX 1.23.0 at review; reviewed June 24, 2026.
- ONNX, Open Neural Network Exchange Intermediate Representation Specification, reviewed June 24, 2026.
- ONNX, ONNX Versioning, reviewed June 24, 2026.
- ONNX, External Data, reviewed June 24, 2026.
- ONNX, ONNX Security Assurance Case, draft dated February 2026; reviewed June 24, 2026.
- ONNX GitHub, onnx/onnx repository, reviewed June 24, 2026.
- ONNX GitHub, ONNX releases, v1.22.0 marked latest at review; reviewed June 24, 2026.
- PyPI, onnx package page, reviewed June 24, 2026.
- ONNX Runtime, documentation overview, reviewed June 24, 2026.
- ONNX Runtime, Execution Providers, reviewed June 24, 2026.
- ONNX Runtime, Architecture, reviewed June 24, 2026.
- ONNX Runtime, Compatibility, reviewed June 24, 2026.
- Microsoft Azure Blog, Announcing ONNX 1.0: An open ecosystem for AI, December 2017; reviewed June 24, 2026.
- Microsoft Open Source Blog, ONNX joins Linux Foundation, November 14, 2019; reviewed June 24, 2026.
- LF AI & Data, ONNX project page, reviewed June 24, 2026.
- LF AI & Data, LF AI welcomes ONNX as a graduate project, November 14, 2019; reviewed June 24, 2026.
- Meta Engineering, AI at F8 2018: Open frameworks and responsible development, May 2, 2018; reviewed June 24, 2026.
- PyTorch Docs, torch.onnx documentation, reviewed June 24, 2026.
- PyTorch Docs, torch.export-based ONNX Exporter, reviewed June 24, 2026.
- NIST, AI Risk Management Framework, reviewed June 24, 2026.
- NIST, Secure Software Development Framework, reviewed June 24, 2026.
- NIST, SP 800-218A: Secure Software Development Practices for Generative AI and Dual-Use Foundation Models, July 2024; reviewed June 24, 2026.