Foundation Models
Foundation models are broadly pretrained AI models that can be adapted across many downstream tasks. The term names both a technical pattern and an institutional shift: one reusable base can become infrastructure for products, APIs, agents, research systems, public services, and legal duties that attach before any single downstream deployment is known.
Definition
A foundation model is a model trained on broad data, usually with self-supervision at scale, that can be adapted to a wide range of downstream tasks. Adaptation may happen through prompting, fine-tuning, retrieval, tool use, instruction tuning, preference training, distillation, adapters, quantization, or embedding the model inside a larger system.
The category includes large language models, multimodal models, vision-language models, code models, audio models, robotics models, embedding models, and world-model-like systems when they serve as reusable bases rather than single-task classifiers. A foundation model is not necessarily open, closed, safe, unsafe, frontier, generative, agentic, or generally intelligent. The defining feature is reuse: one pretrained base becomes the starting point for many later tasks, products, and institutions.
Regulators often use the related term general-purpose AI model. The EU AI Act's GPAI category overlaps heavily with foundation models, especially models that display significant generality and can be integrated into many downstream systems or applications. The legal term matters because it attaches obligations to model providers, not only to final application deployers.
Four units should not be collapsed: the base model trained at scale, the adapted model produced through post-training or fine-tuning, the release artifact made available through an API, gated access, open weights, or a product, and the deployed system that includes prompts, retrieval, tools, user interface, policies, logging, and human workflow. Most real-world risk lives in the interaction among all four.
Snapshot
- Technical core: broad pretraining followed by reuse, adaptation, or integration into downstream systems.
- Common modalities: text, code, images, audio, video, documents, embeddings, tool calls, robot state, and mixed multimodal inputs or outputs.
- Not the same as: a chatbot, a large language model, an open-weight model, a frontier model, a deployed AI system, or a legal guarantee of safety.
- Governance unit: the model version, training data summary, post-training method, release route, model card or system card, evaluation record, deployment wrapper, and downstream use case.
- Core risk pattern: a defect or unsafe capability in the base layer can propagate into many products, while downstream wrappers can create new risks the base model report did not test.
- Evidence boundary: the label "foundation model" supports a claim about broad pretraining and reuse, not a claim that the model is safe, fair, lawful, reliable, or suitable for a specific deployment.
Boundary Tests
Use the term carefully. A foundation model is a technical and institutional category. A general-purpose AI model is a legal category under the EU AI Act and related guidance. A frontier model is a policy and safety category for especially capable systems. An open-weight model is a release route. A deployed AI system is the full product or workflow in which people are affected.
The same artifact can fall into several categories, but each category answers a different question. "Foundation model" asks whether the model is broadly pretrained and reusable. "GPAI" asks whether legal obligations attach to the provider. "Open-weight" asks whether trained parameters are downloadable. "Systemic risk" asks whether the model meets a regulatory or safety threshold. "Deployed system" asks how the model is used with data, tools, people, permissions, and institutional authority.
This boundary matters for accountability. A base model can be well documented while a downstream product is unsafe. A product can be well controlled while the upstream provider remains opaque. An open-weight release can improve research access while reducing provider control after release. A governance record should name which layer is being evaluated.
Lineage
The phrase "foundation model" was popularized by Stanford researchers in the 2021 report On the Opportunities and Risks of Foundation Models. The report argued that AI was shifting from task-specific systems toward models whose broad pretraining made them adaptable across domains, creating both capability gains and systemic risks.
The technical lineage includes representation learning, transfer learning, self-supervised learning, word embeddings, BERT, GPT-style language models, CLIP, diffusion models, vision transformers, and later multimodal systems. Transformers became the dominant architecture for many foundation models, but the concept is broader than any one architecture.
The institutional lineage is just as important. Foundation models changed who can build AI systems. A downstream developer can call an API, fine-tune an open-weight model, add retrieval, or wrap a model in an agent without training a base model from scratch. That makes AI development faster, but it also concentrates upstream power in the organizations that train, host, license, and document the base models.
Current Context
As of June 25, 2026, foundation models are no longer only a research category. They are a regulated infrastructure layer. The European Commission says the EU AI Act obligations for providers of general-purpose AI models entered into application on August 2, 2025; Commission enforcement powers begin on August 2, 2026; and providers of GPAI models placed on the market before August 2, 2025 must comply by August 2, 2027.
The EU's General-Purpose AI Code of Practice, published July 10, 2025, gives providers a voluntary route for demonstrating compliance with AI Act obligations on transparency, copyright, and, for models with systemic risk, safety and security. The Commission's GPAI guidance also emphasizes definitions, significant modifications, open-source exemptions, and the fact that model-layer obligations and AI-system obligations can both matter when a GPAI model is integrated into a deployed system. These materials are guidance and compliance tools, not independent proof that a model or deployment is safe.
Transparency remains a live governance problem. Stanford's December 2025 Foundation Model Transparency Index scored 13 major developers and reported that average transparency fell from 58 out of 100 in 2024 to 40 out of 100 in 2025 under revised indicators. The index is not a safety certification, but it is useful evidence about how little public information is available on training data, compute, downstream use, and societal impact for systems that downstream users increasingly depend on.
Safety guidance is becoming more specific. NIST's January 2025 second public draft on managing misuse risk for dual-use foundation models describes voluntary practices across the AI lifecycle and adds attention to model evaluations, cyber risk, chemical and biological risk, marginal risk, open models, and supply-chain actors. The 2026 International AI Safety Report similarly treats open-weight releases as a distinct governance problem because weights cannot be recalled once broadly released and safeguards are easier to remove.
The market context also changed. Open-weight releases, closed hosted APIs, multimodal systems, reasoning models, coding agents, world models, enterprise model routers, and on-device deployments all use foundation models differently. A single term now covers many release routes and risk profiles, so current claims should identify the exact model, version, modality, access path, evaluation setting, and deployment context.
How It Works
Pretraining. A base model learns from large corpora of text, code, images, audio, video, sensor data, or mixed modalities. The training objective is often generic, such as predicting missing or next tokens, matching images to text, reconstructing masked inputs, or learning useful latent representations.
Adaptation. The pretrained model is adapted through prompts, supervised fine-tuning, reinforcement learning from human or AI feedback, direct preference optimization, retrieval-augmented generation, tool calling, or task-specific heads.
Release and access. The same base can be offered through a hosted API, a consumer application, enterprise deployment, research preview, gated download, open-weight checkpoint, or full open-source-style release. Access route changes who can audit, modify, monitor, patch, or recall the model.
System wrapping. In deployment, the model is rarely alone. It is surrounded by prompts, safety policies, retrieval databases, memory, UI constraints, logging, permissions, model routers, moderation layers, human review, and product incentives.
Reuse. The same base capability can appear in search, coding, education, healthcare, finance, robotics, customer service, advertising, creative tools, military systems, and public administration. The foundation becomes a shared substrate for many social contexts.
Why It Matters
Foundation models turn AI capability into infrastructure. Instead of building a separate model for each task, institutions build on top of a reusable base. This accelerates product development and research, but it also means that flaws in the base model can propagate into many systems.
The model becomes a dependency layer. Downstream actors may depend on upstream providers for pricing, uptime, safety updates, model behavior, content filters, licenses, data-retention policies, context windows, and access to weights. A model update can change many products at once.
Foundation models also blur responsibility. When an AI system harms someone, the relevant chain may include training data suppliers, model developers, fine-tuners, cloud providers, API vendors, application developers, deployers, prompt designers, retrieval databases, tool providers, and human operators. The more general the base model, the harder it becomes to say where responsibility begins and ends unless the system record preserves the chain.
The same generality that creates leverage also creates governance pressure. A base model can be recontextualized into an educational tutor, medical assistant, coding agent, hiring tool, search system, fraud detector, companion, or military workflow. The question is not only what the model can do, but who is allowed to adapt it and what evidence must travel with that adaptation.
Governance and Safety
Foundation-model governance must operate upstream and downstream at the same time. Upstream governance asks how the base model was trained, evaluated, secured, documented, licensed, and released. Downstream governance asks how the model is used in a specific product or institution.
The EU AI Act addresses this by imposing obligations on providers of general-purpose AI models, including documentation, information for downstream providers, copyright-policy duties, and summaries of training content. Models with systemic risk face additional expectations around evaluation, risk assessment, incident reporting, cybersecurity, and safety mitigations.
Safety must be evaluated at several layers. Base-model evaluations can test broad capabilities, memorization, bias, unsafe assistance, cyber or biosecurity misuse, persuasion, privacy leakage, robustness, and model-weight security. System evaluations must test the actual deployed stack: prompts, tools, retrieval sources, permissions, monitoring, user interface, human review, and the population affected. A model that is acceptable in one context can be unacceptable in another.
Documentation should include model cards or system cards, training-data summaries, evaluation methodology, known limitations, post-training methods, release restrictions, incident processes, security assumptions, and downstream integration guidance. Documentation does not solve governance, but without it downstream users and regulators are forced to govern an infrastructure layer they cannot see.
Release governance should distinguish closed API access, gated access, research access, open-weight release, and full open-source-style release. Open weights can support accountability, competition, and local control, but they can also make safety mitigations easier to remove and recall difficult. Closed systems can preserve more centralized control, but they can also hide evidence, concentrate power, and make independent evaluation harder.
Downstream governance should require an AI system inventory, procurement record, model or system card, data-processing terms, evaluation report, incident channel, human-oversight design, and update policy before high-impact use. Foundation-model governance fails when the base model is documented but the deployment wrapper, retrieval corpus, tool permissions, logging, and affected-person recourse are not.
Minimum Foundation-Model Record
A foundation-model record should make it possible to track a capability from pretraining through post-training, release, integration, and deployment. The record can have public, customer, regulator, and security-restricted layers, but some accountable record should exist for each material claim.
- Model identity: model name, version, provider, release date, modalities, context or input limits, base-model lineage, adaptation lineage, license, access route, and whether weights are closed, hosted, gated, open-weight, or fully open.
- Training and data: training-data summary, data provenance constraints, filtering methods, synthetic-data use, data-enrichment labor, copyright policy, privacy assumptions, and known exclusions or blind spots.
- Post-training and adaptation: instruction tuning, preference training, safety tuning, fine-tunes, adapters, quantization, distillation, retrieval systems, tool-use scaffolds, model routers, and downstream modifications.
- Evaluation evidence: capability benchmarks, safety evaluations, red-team results, contamination checks, domain tests, uncertainty, failed tests, version dates, and whether tests covered the base model, adapted model, API surface, or full deployed system.
- Release and security controls: access tier, model-weight security, rate limits, monitoring, abuse reporting, vulnerability disclosure, incident response, recall or rollback options, and open-weight risk assessment where applicable.
- Downstream obligations: integration guidance, prohibited or unsupported uses, human-oversight expectations, logging requirements, data-retention terms, user notice, appeal routes, and update triggers for customers and deployers.
This minimum record connects foundation models to model and system cards, AI system inventories, data provenance, AI bills of materials, evaluations, and audit trails. Without this connective tissue, responsibility gets lost between the upstream provider and the downstream deployer.
Risk Pattern
Inherited harm. Bias, toxicity, memorization, copyright exposure, privacy leakage, unsafe capabilities, and benchmark contamination can travel from the base model into many downstream systems.
Capability overhang. A base model may contain abilities that are not obvious until new prompting, tools, fine-tuning, retrieval, or scaffolding unlocks them.
Opacity. Developers often disclose limited information about training data, labor pipelines, compute, model architecture, evaluations, and post-training changes, especially for commercial frontier models.
Centralization. Training frontier foundation models requires large amounts of data, compute, capital, engineering, energy, and distribution. This can concentrate power in a small number of labs and cloud platforms.
Downstream mismatch. A model trained for broad usefulness may be embedded in contexts with legal, medical, educational, financial, military, or emotional stakes that its general training did not adequately cover.
Release-route mismatch. A model released for research can become a production dependency; an open-weight checkpoint can be fine-tuned into a higher-risk system; a hosted API can change behavior without downstream users updating their documentation.
Supply-chain fragility. Foundation-model systems depend on datasets, weights, tokenizers, adapters, guard models, model routers, inference providers, vector stores, evaluation harnesses, and monitoring tools. A failure or compromise in one layer can change the deployed system.
False neutrality. Because foundation models are general-purpose, providers may present them as neutral infrastructure. But choices about data, filtering, refusal behavior, licensing, availability, and deployment defaults are political and institutional choices.
Evaluation gap. Pre-deployment benchmarks and red-team tests can miss real-world risks after a model is wrapped in tools, retrieval, memory, user-specific data, or agent loops. The 2026 International AI Safety Report frames this as a technical and institutional challenge for general-purpose AI risk management.
Source Discipline
Claims about foundation models should identify the exact source type. A research paper, model card, system card, benchmark report, official product announcement, regulator guidance, transparency index, license file, and independent audit each support different claims.
For model facts, name the model version, release date, modality, access route, and whether the claim concerns a base model, instruction-tuned model, open-weight checkpoint, hosted API, product wrapper, or agent scaffold. For benchmark claims, name the evaluation harness, prompting or tool setup, date, and whether the result includes retrieval, code execution, sampling, or other scaffolding.
For governance claims, prefer primary sources: the AI Act text, European Commission GPAI guidance, NIST publications, official safety frameworks, standards-body documents, published model/system cards, and documented incident reports. Treat company transparency reports and model cards as provider claims unless independently audited or backed by reproducible evidence.
Do not convert one evidentiary status into another. A Stanford transparency score describes public disclosure, not model safety. A NIST profile or draft guidance describes risk-management practice, not legal compliance by itself. A model card describes what the provider reports. A regulator filing, audit, or incident record answers a different question.
When legal terms are at issue, quote the legal instrument or official guidance rather than translating loosely from technical vocabulary. "Foundation model," "general-purpose AI model," "general-purpose AI model with systemic risk," "open-weight model," and "open-source AI" are related but not interchangeable.
Avoid turning "foundation model" into a loose synonym for "AI." A small task-specific classifier, a rules-based product, a fine-tuned downstream model, and a full deployed AI service may all be AI, but they raise different evidence and accountability questions.
Spiralist Reading
Foundation models are the reusable substrate beneath many Mirror interfaces.
They take large collections of data, labor, computation, and institutional choice, then return them as reusable capability. The user sees an assistant, search box, coding agent, tutor, companion, or creative tool. Beneath that surface is a model whose training and post-training decisions may be only partially visible.
For Spiralism, the central danger is foundation without accountability. A model can become a hidden public utility while remaining privately governed, partially documented, and optimized for incentives the public cannot inspect. The foundation then shapes attention, work, memory, knowledge, and authority while presenting itself as a neutral service layer.
The constructive task is not to reject foundation models. It is to demand source discipline around them: provenance, disclosure, evaluation, appeal, public-interest alternatives, strong security, downstream accountability, and the preservation of cognitive sovereignty for people who live on top of these systems.
Open Questions
- How should law distinguish foundation-model provider responsibility from downstream deployer responsibility?
- What information can be made public without creating security risk or exposing legitimate trade secrets?
- Can open-weight foundation models support accountability and competition without making dangerous capabilities too easy to misuse?
- How should model updates be communicated when many downstream systems depend on stable behavior?
- What public or nonprofit foundation-model infrastructure would reduce dependence on private defaults?
- Which foundation-model facts should be public by default: training data summary, compute, evaluations, incidents, energy and water use, downstream use, or model-weight security practices?
Related Pages
- Transformer Architecture
- BERT
- Scaling Laws
- Pretraining
- Post-Training
- Reinforcement Learning from Human Feedback
- Training Data
- AI Data Provenance
- AI Compute
- Multimodal AI
- Reasoning Models
- World Models and Spatial Intelligence
- Open-Weight AI Models
- Model Cards and System Cards
- AI System Inventory
- AI Procurement
- AI Bill of Materials
- AI Evaluations
- AI Audits and Third-Party Assurance
- AI Red Teaming
- Frontier AI Safety Frameworks
- EU AI Act
- NIST AI Risk Management Framework
- Model Weight Security
- Secure AI System Development
- AI Incident Reporting
- AI Agent Identity
- AI Agent Observability
- Capability Elicitation
- Retrieval-Augmented Generation
- AI Liability and Accountability
- Algorithmic Impact Assessments
- Sovereign AI
- Percy Liang
- AI Organizations
- Emily M. Bender
Sources
- Stanford CRFM, On the Opportunities and Risks of Foundation Models, 2021; reviewed June 25, 2026.
- Rishi Bommasani et al., On the Opportunities and Risks of Foundation Models, arXiv, 2021.
- Stanford HAI, Introducing The Foundation Model Transparency Index, October 18, 2023.
- Stanford CRFM, Foundation Model Transparency Index, December 2025.
- Stanford HAI, Transparency in AI is on the Decline, December 2025.
- NIST, AI Risk Management Framework, reviewed June 25, 2026.
- NIST, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, July 26, 2024; reviewed June 25, 2026.
- NIST, Updated Guidelines for Managing Misuse Risk for Dual-Use Foundation Models, January 15, 2025; reviewed June 25, 2026.
- NIST, Managing Misuse Risk for Dual-Use Foundation Models, NIST AI 800-1, second public draft, January 2025.
- NIST, AI Test, Evaluation, Validation and Verification, reviewed June 25, 2026.
- International AI Safety Report, International AI Safety Report 2026, February 2026; reviewed June 25, 2026.
- EUR-Lex, Regulation (EU) 2024/1689, Artificial Intelligence Act, official text.
- European Commission, Guidelines for providers of general-purpose AI models, reviewed June 25, 2026.
- European Commission, General-Purpose AI Models in the AI Act: Questions and Answers, reviewed June 25, 2026.
- European Commission, General-purpose AI obligations under the AI Act, reviewed June 25, 2026.
- European Commission, The General-Purpose AI Code of Practice, published July 10, 2025; reviewed June 25, 2026.
- Open Source Initiative, The Open Source AI Definition 1.0, reviewed June 25, 2026.