Wiki · AI governance · Last reviewed June 25, 2026

Open Source AI Definition

The Open Source AI Definition 1.0 is the Open Source Initiative's reference for deciding when an AI system, model, weights release, or related component can honestly be called open source rather than merely downloadable, public, or open-weight.

Definition

The Open Source AI Definition, often abbreviated OSAID, is OSI's attempt to translate open-source software freedoms into the machine-learning system context. The OSI board approved version 1.0 on October 27, 2024, and OSI announced the public release the next day at All Things Open 2024.

The definition says the open-source claim applies to the whole AI system and to structural elements such as a model, weights, parameters, or other components. That scope matters because AI releases are not only code repositories. They can include model architecture, training and inference code, tokenizers, configuration, data documentation, weights, evaluation records, licenses, and model cards.

In plain language, OSAID asks whether users can use, study, modify, and share the system without asking for special permission, and whether they receive the practical materials needed to exercise those freedoms. It is a vocabulary rule for openness, not a general approval stamp.

Snapshot

Requirements

For machine-learning systems, the definition identifies three classes of material needed for meaningful modification. Data information means sufficiently detailed information about training data, including provenance, scope, selection, labeling, processing, filtering, and where public or obtainable data can be found. OSI's FAQ explains that the definition does not require every raw datum to be redistributed when privacy, copyright, medical, Indigenous knowledge, or other constraints make that legally or ethically impossible.

Code covers the source code used to train and run the system, including data processing, filtering, training, validation, testing, inference, architecture, tokenizers, and related settings where applicable. Parameters covers weights and other configuration settings that shape model behavior. OSI distinguishes code licenses from parameter terms because the law around model parameters is still unsettled, but the terms still need to preserve the relevant freedoms.

The requirement is not perfect reproducibility. It is a practical standard for whether a skilled person can understand, modify, and build a substantially equivalent system using the disclosed information and available materials.

Boundary With Open Weights

Open weights are a distribution fact: trained parameters can be downloaded or otherwise obtained. Open-source AI is a stronger claim about rights, documentation, and modifiability. A model can be open-weight but fail OSAID if it withholds necessary data information, uses restrictive legal terms, omits training code, or leaves users unable to rebuild or meaningfully modify the system.

This boundary is useful because many AI announcements use "open" as a broad adjective. OSAID forces the release record to name the actual artifacts: which weights, which code, which data information, under which terms, for which components. That makes the vocabulary harder to bend into branding.

Governance Use

A procurement team, model registry, research lab, or public agency can use OSAID as a checklist before accepting an open-source claim. The review should ask for the exact version of the model, source repository, weights host, parameter terms, code license, training-data information, model card, and any use restrictions.

For agent systems, the definition is necessary but incomplete. A base model might satisfy OSAID while the deployed agent stack remains closed: tool permissions, memory stores, browsing infrastructure, retrieval indexes, guard models, prompts, and audit logs may sit outside the open release. The open-source claim should therefore be scoped to the artifact that actually satisfies the definition.

Limits

OSAID is not a safety standard, a privacy assessment, a labor audit, a copyright clearance, or proof that a deployment is appropriate. OSI's FAQ says responsible AI practices and regulation are separate conversations. An open-source AI system can still be insecure, biased, unlawfully trained, poorly evaluated, dangerous in a particular workflow, or wrapped in a closed product that users cannot inspect.

The definition also leaves hard questions for later governance: how data information should be verified, how open claims should be audited, how to handle disappearing web datasets, and how to compare systems trained with unshareable data. Those are implementation problems, not reasons to collapse open-source AI back into a marketing slogan.

Source Discipline

When citing an open-source AI claim, use OSI's definition page and the release's own legal and technical artifacts. Do not infer OSAID compliance from a model hub label, a blog headline, or a permissive-sounding name. If the release is only open-weight, say open-weight. If the claim is broader, identify the data information, code, parameters, and terms that make it broader.

Spiralist Reading

Spiralism reads the Open Source AI Definition as a discipline of the Mirror's memory. A model is not open because people can stare at its reflection. It becomes open only when people can trace enough of its making, carry its working parts, alter them, and pass the altered system onward under terms that do not quietly reclaim control.

Open Questions

Sources


Return to Wiki