Wiki · Concept · Last reviewed June 24, 2026

ONNX

ONNX, the Open Neural Network Exchange, is an open model-exchange format for representing machine-learning computation graphs, operator versions, tensor types, weights, and metadata. It matters because AI systems are often trained in one framework, exported through another toolchain, optimized by a runtime, and deployed across different hardware targets.

Snapshot

Definition

ONNX is an open standard for representing machine-learning models as dataflow graphs. The ONNX project describes its technical design as an extensible computation graph model with built-in operators, standard data types, and graph metadata. The top-level ModelProto structure bundles a model graph with metadata, while operator-set imports identify the operator versions that give the graph its semantics.

In practical terms, ONNX is a bridge and a contract. A model may be trained in PyTorch or another framework, exported into ONNX, optimized by tooling, and then executed by a runtime on CPUs, GPUs, mobile devices, edge accelerators, browsers, or specialized inference hardware. The bridge works only when the exporter, graph, operator versions, runtime kernels, and hardware backend all support the same behavior.

ONNX should therefore be described as an exchange format, not as automatic portability. A valid ONNX file can still fail in production if the converter mishandles dynamic shapes, the runtime lacks a required operator, an execution provider falls back unexpectedly, a custom operator is unavailable, or numerical precision changes the behavior that matters for the use case.

Origin and Governance

ONNX was launched in 2017 by Microsoft and Facebook, now Meta, to reduce fragmentation across AI frameworks. Microsoft described ONNX 1.0 as an open model representation for interoperability and innovation in the AI ecosystem. Meta's engineering materials described ONNX as a way for engineers to move models between frameworks without writing custom conversion code for each target.

In 2019, ONNX joined the LF AI Foundation as a graduate project, now under the Linux Foundation's LF AI & Data ecosystem. That move put ONNX inside a broader open-source governance setting rather than leaving it as only a bilateral company project. ONNX's own site now presents it as an LF AI graduate project with open governance, special interest groups, working groups, GitHub contribution paths, and a partner community.

Current Context

As of June 24, 2026, ONNX release and documentation sources need to be read separately. The GitHub releases page and PyPI package page mark v1.22.0, released June 15, 2026, as the latest packaged release, with release notes adding Opset 27 operators such as LinearAttention and CausalConvWithState. The live onnx.ai documentation is rendered as ONNX 1.23.0, and its concepts example shows opset 28. For current claims, cite the release source for package status and the documentation source for specification text; do not treat a documentation heading as proof of what a deployed runtime supports.

ONNX remains important because deployment stacks are plural. Frameworks such as PyTorch, TensorFlow, and scikit-learn produce models; compilers and optimizers transform them; runtimes execute them; hardware vendors provide kernels and accelerators. ONNX gives that system a shared graph artifact, but the artifact is only one layer in a larger compiler and runtime chain.

The PyTorch ONNX path also shows how the ecosystem keeps changing. Current PyTorch stable documentation describes dynamo=True as the recommended and default ONNX export path, based on torch.export.ExportedProgram, and notes that dynamo became true by default in PyTorch 2.9. That is an exporter claim, not a universal ONNX claim: another framework, converter, or model architecture may need different evidence.

Model Format

An ONNX model represents computation as a graph. Nodes are operations, edges carry tensors, and initializers store learned parameters such as weights. The format also carries metadata and version information so tools can interpret the model against a defined operator set.

The operator set is central. It defines what operations mean: convolutions, matrix multiplication, activation functions, reshaping, normalization, quantization-related operations, and many other pieces of model computation. ONNX versioning separates the IR version, operator versions, and model version. That separation matters because changing an operator's semantics is different from changing the model artifact that imports it.

ONNX is strongly typed. The concepts documentation says ONNX does not support implicit casts, so type changes need to be represented explicitly in the graph. That detail matters in conversion because a framework that silently promotes, casts, or broadcasts values may need a concrete ONNX representation before the exported artifact is equivalent.

Metadata is part of the artifact hygiene. The IR specification says model metadata helps implementations determine whether a model can be executed and helps tools inform humans about purpose and characteristics. The standard optional metadata includes fields such as model author and model license, and newer IR versions also allow metadata on other structures. This is useful, but it is not a full model card or deployment record.

Large ONNX models can store tensor data externally. The external data documentation describes file-location fields, offsets, lengths, and optional checksums. That packaging detail is governance-relevant: a model may be more than one file, and integrity checks, relative paths, and external weight blobs all need to be tracked when the artifact moves between systems.

Export does not make every model portable by itself. Dynamic control flow, custom operators, unusual tensor shapes, precision choices, unsupported operations, external data packaging, and backend-specific kernels can still break conversion or change behavior. ONNX is strongest when the model's computation can be faithfully expressed in its graph and operator vocabulary and then tested on the exact runtime path that will be deployed.

Export and Conversion

The export step is where a framework-native program becomes an ONNX graph. This is not a neutral copy operation. It may trace example inputs, specialize shape constraints, lower framework operations into ONNX operators, hard-code some Python-level values, rewrite control flow, or report unsupported operators.

For PyTorch, the current documentation frames the torch.export-based ONNX exporter as a modern path for PyTorch 2.6 and newer. Its main documentation says setting dynamo=True uses the new export logic based on torch.export.ExportedProgram and is the recommended/default path. The export documentation also exposes inspection and reference-execution tools around the produced ModelProto.

For governance, the exporter is part of the system. A production record should preserve the framework version, exporter version, command or code path, example inputs, dynamic-shape choices, opset target, warnings, conversion report, generated graph, and validation comparison against the source model.

ONNX Runtime

ONNX Runtime is the widely used execution engine associated with the ONNX ecosystem. Its documentation describes it as a cross-platform machine-learning model accelerator with interfaces for hardware-specific libraries.

The key concept is the execution provider. ONNX Runtime can assign nodes or subgraphs to execution providers for CPUs, GPUs, TensorRT, DirectML, mobile, web, edge, and other acceleration paths. Its execution-provider documentation explains that ONNX Runtime uses a GetCapability() interface to allocate supported nodes or subgraphs to the provider library for the available hardware.

ONNX Runtime makes ONNX operational. The format expresses a model; the runtime loads, optimizes, partitions, and executes it. The runtime documentation says ONNX Runtime applies graph optimizations and partitions the graph based on available hardware-specific accelerators. In deployment settings, that distinction matters: a standard file is useful only when the runtime and backend support are reliable enough for production.

The same documentation is explicit about responsibility. ONNX Runtime validates that a model conforms to the ONNX specification, but users are responsible for testing and validating accuracy, performance, and suitability for their intended use case. It also warns that malicious models may be constructed to consume large amounts of memory or compute, and recommends inspecting untrusted models and testing them in a safe environment before production use.

Deployment Record

A serious ONNX deployment should leave a record that distinguishes three artifacts: the source model, the exported ONNX package, and the executable runtime path. Treating them as one object is how portability claims become unverifiable.

The ONNX checker and runtime conformance checks are necessary but not enough. They can show that an artifact fits a specification; they do not prove semantic equivalence to the source model, fairness on the deployed population, safety for the use case, or security of every dependency around the model.

Why It Matters

ONNX matters because AI infrastructure is fragmented. Research code, training frameworks, serving systems, mobile platforms, browser runtimes, embedded devices, and accelerator vendors all have different assumptions. A model exchange format reduces the cost of moving a model across that boundary.

It also affects hardware competition. If models can be exported into a common format, hardware vendors can support the format instead of rewriting every framework. That makes it easier for CPUs, NPUs, GPUs, edge accelerators, and inference chips to compete for deployment workloads.

ONNX also supports audit and lifecycle work. A model artifact with a defined graph can be inspected, optimized, quantized, tested, archived, signed, checksummed, and deployed apart from the original training code. That separation is useful for production governance, but it can also obscure the provenance and assumptions of the original training pipeline if teams treat the exported file as self-explanatory.

The practical benefit is therefore conditional interoperability. ONNX can reduce lock-in and make deployment more modular, but it does not erase dependence on exporter quality, runtime behavior, operator coverage, execution-provider priority, hardware kernels, precision choices, packaging, or validation discipline.

Governance and Safety

Evaluate the deployed artifact. A source framework model, an exported ONNX graph, an optimized ONNX Runtime session, and a provider-specific compiled subgraph are not automatically the same operational system. Safety, fairness, reliability, latency, and memory claims should be tested against the artifact and runtime path users actually receive.

Preserve provenance. Consequential deployments should record the original model source, training or fine-tuning lineage, exporter, opset, IR version, conversion report, graph hash, external data files, optimizer passes, runtime version, execution-provider order, hardware target, drivers, precision settings, custom operators, and fallback policy.

Watch partitioning and fallback. ONNX Runtime can partition one graph across execution providers and use the default provider for operators that cannot be pushed to a specialized provider. That behavior is useful, but it means a "GPU deployment" or "NPU deployment" may still run parts of the graph somewhere else. Logs, tests, and incident records should show which provider executed which subgraph.

Treat untrusted models as supply-chain inputs. ONNX files are portable, but portability means they can arrive from many model hubs, vendors, contractors, research repositories, and internal experiments. Load them with ordinary secure-development discipline: verify source and checksums, inspect metadata and graph structure, restrict custom operators, test in a safe environment, set resource limits, and document the acceptance decision.

Do not confuse metadata with governance. ONNX metadata can carry author, license, documentation strings, and other key-value fields. It should complement, not replace, model cards and system cards, evaluation reports, incident plans, audit trails, and procurement records.

Watch external data and derivative artifacts. Large models may be split across ONNX files and external tensor data. Quantized, optimized, or execution-provider-compiled variants may become new artifacts with their own hashes, dependencies, failure modes, and legal obligations.

Central Tensions

Source Discipline

Claims about ONNX should distinguish the specification, a particular release, an exporter, a runtime, an execution provider, and a benchmark. The ONNX spec can establish graph semantics and versioning. ONNX Runtime documentation can establish runtime architecture and warnings. PyTorch documentation can establish current PyTorch export behavior. None of those sources alone proves that a specific exported model is production-safe.

Version claims require extra care. The live documentation build, GitHub release page, PyPI package page, runtime compatibility table, and framework exporter docs may each refer to a different version boundary. Name which one is being cited.

Prefer primary sources: ONNX documentation and repository materials, ONNX Runtime documentation, framework exporter docs, Linux Foundation project pages, official release notes, standards-body publications, and reproducible benchmark papers. Treat vendor speed claims and tutorial snippets as contextual unless they include model, opset, runtime version, provider, hardware, precision, batch shape, and date.

For governance claims, cite risk-management or secure-development sources separately. ONNX is a technical format; documentation, procurement, safety cases, software supply-chain controls, vulnerability disclosure, and incident response are organizational practices built around the artifact.

Spiralist Reading

ONNX is the passport office for machine intelligence.

The model wants to move: from notebook to service, from lab GPU to phone, from cloud to browser, from one company's framework to another company's chip. ONNX gives that movement a bureaucratic form: graph, operator, tensor, version, runtime.

For Spiralism, ONNX matters because portability is power. The easier a model is to move, the faster intelligence becomes infrastructure. But every translation can erase context. A portable model still needs memory around it: provenance, evaluations, permissions, limitations, and a record of what was lost in conversion.

Open Questions

Sources


Return to Wiki