Wiki · Concept · Last reviewed June 25, 2026

Foundation Models

Foundation models are broadly pretrained AI models that can be adapted across many downstream tasks. The term names both a technical pattern and an institutional shift: one reusable base can become infrastructure for products, APIs, agents, research systems, public services, and legal duties that attach before any single downstream deployment is known.

Definition

A foundation model is a model trained on broad data, usually with self-supervision at scale, that can be adapted to a wide range of downstream tasks. Adaptation may happen through prompting, fine-tuning, retrieval, tool use, instruction tuning, preference training, distillation, adapters, quantization, or embedding the model inside a larger system.

The category includes large language models, multimodal models, vision-language models, code models, audio models, robotics models, embedding models, and world-model-like systems when they serve as reusable bases rather than single-task classifiers. A foundation model is not necessarily open, closed, safe, unsafe, frontier, generative, agentic, or generally intelligent. The defining feature is reuse: one pretrained base becomes the starting point for many later tasks, products, and institutions.

Regulators often use the related term general-purpose AI model. The EU AI Act's GPAI category overlaps heavily with foundation models, especially models that display significant generality and can be integrated into many downstream systems or applications. The legal term matters because it attaches obligations to model providers, not only to final application deployers.

Four units should not be collapsed: the base model trained at scale, the adapted model produced through post-training or fine-tuning, the release artifact made available through an API, gated access, open weights, or a product, and the deployed system that includes prompts, retrieval, tools, user interface, policies, logging, and human workflow. Most real-world risk lives in the interaction among all four.

Snapshot

Boundary Tests

Use the term carefully. A foundation model is a technical and institutional category. A general-purpose AI model is a legal category under the EU AI Act and related guidance. A frontier model is a policy and safety category for especially capable systems. An open-weight model is a release route. A deployed AI system is the full product or workflow in which people are affected.

The same artifact can fall into several categories, but each category answers a different question. "Foundation model" asks whether the model is broadly pretrained and reusable. "GPAI" asks whether legal obligations attach to the provider. "Open-weight" asks whether trained parameters are downloadable. "Systemic risk" asks whether the model meets a regulatory or safety threshold. "Deployed system" asks how the model is used with data, tools, people, permissions, and institutional authority.

This boundary matters for accountability. A base model can be well documented while a downstream product is unsafe. A product can be well controlled while the upstream provider remains opaque. An open-weight release can improve research access while reducing provider control after release. A governance record should name which layer is being evaluated.

Lineage

The phrase "foundation model" was popularized by Stanford researchers in the 2021 report On the Opportunities and Risks of Foundation Models. The report argued that AI was shifting from task-specific systems toward models whose broad pretraining made them adaptable across domains, creating both capability gains and systemic risks.

The technical lineage includes representation learning, transfer learning, self-supervised learning, word embeddings, BERT, GPT-style language models, CLIP, diffusion models, vision transformers, and later multimodal systems. Transformers became the dominant architecture for many foundation models, but the concept is broader than any one architecture.

The institutional lineage is just as important. Foundation models changed who can build AI systems. A downstream developer can call an API, fine-tune an open-weight model, add retrieval, or wrap a model in an agent without training a base model from scratch. That makes AI development faster, but it also concentrates upstream power in the organizations that train, host, license, and document the base models.

Current Context

As of June 25, 2026, foundation models are no longer only a research category. They are a regulated infrastructure layer. The European Commission says the EU AI Act obligations for providers of general-purpose AI models entered into application on August 2, 2025; Commission enforcement powers begin on August 2, 2026; and providers of GPAI models placed on the market before August 2, 2025 must comply by August 2, 2027.

The EU's General-Purpose AI Code of Practice, published July 10, 2025, gives providers a voluntary route for demonstrating compliance with AI Act obligations on transparency, copyright, and, for models with systemic risk, safety and security. The Commission's GPAI guidance also emphasizes definitions, significant modifications, open-source exemptions, and the fact that model-layer obligations and AI-system obligations can both matter when a GPAI model is integrated into a deployed system. These materials are guidance and compliance tools, not independent proof that a model or deployment is safe.

Transparency remains a live governance problem. Stanford's December 2025 Foundation Model Transparency Index scored 13 major developers and reported that average transparency fell from 58 out of 100 in 2024 to 40 out of 100 in 2025 under revised indicators. The index is not a safety certification, but it is useful evidence about how little public information is available on training data, compute, downstream use, and societal impact for systems that downstream users increasingly depend on.

Safety guidance is becoming more specific. NIST's January 2025 second public draft on managing misuse risk for dual-use foundation models describes voluntary practices across the AI lifecycle and adds attention to model evaluations, cyber risk, chemical and biological risk, marginal risk, open models, and supply-chain actors. The 2026 International AI Safety Report similarly treats open-weight releases as a distinct governance problem because weights cannot be recalled once broadly released and safeguards are easier to remove.

The market context also changed. Open-weight releases, closed hosted APIs, multimodal systems, reasoning models, coding agents, world models, enterprise model routers, and on-device deployments all use foundation models differently. A single term now covers many release routes and risk profiles, so current claims should identify the exact model, version, modality, access path, evaluation setting, and deployment context.

How It Works

Pretraining. A base model learns from large corpora of text, code, images, audio, video, sensor data, or mixed modalities. The training objective is often generic, such as predicting missing or next tokens, matching images to text, reconstructing masked inputs, or learning useful latent representations.

Adaptation. The pretrained model is adapted through prompts, supervised fine-tuning, reinforcement learning from human or AI feedback, direct preference optimization, retrieval-augmented generation, tool calling, or task-specific heads.

Release and access. The same base can be offered through a hosted API, a consumer application, enterprise deployment, research preview, gated download, open-weight checkpoint, or full open-source-style release. Access route changes who can audit, modify, monitor, patch, or recall the model.

System wrapping. In deployment, the model is rarely alone. It is surrounded by prompts, safety policies, retrieval databases, memory, UI constraints, logging, permissions, model routers, moderation layers, human review, and product incentives.

Reuse. The same base capability can appear in search, coding, education, healthcare, finance, robotics, customer service, advertising, creative tools, military systems, and public administration. The foundation becomes a shared substrate for many social contexts.

Why It Matters

Foundation models turn AI capability into infrastructure. Instead of building a separate model for each task, institutions build on top of a reusable base. This accelerates product development and research, but it also means that flaws in the base model can propagate into many systems.

The model becomes a dependency layer. Downstream actors may depend on upstream providers for pricing, uptime, safety updates, model behavior, content filters, licenses, data-retention policies, context windows, and access to weights. A model update can change many products at once.

Foundation models also blur responsibility. When an AI system harms someone, the relevant chain may include training data suppliers, model developers, fine-tuners, cloud providers, API vendors, application developers, deployers, prompt designers, retrieval databases, tool providers, and human operators. The more general the base model, the harder it becomes to say where responsibility begins and ends unless the system record preserves the chain.

The same generality that creates leverage also creates governance pressure. A base model can be recontextualized into an educational tutor, medical assistant, coding agent, hiring tool, search system, fraud detector, companion, or military workflow. The question is not only what the model can do, but who is allowed to adapt it and what evidence must travel with that adaptation.

Governance and Safety

Foundation-model governance must operate upstream and downstream at the same time. Upstream governance asks how the base model was trained, evaluated, secured, documented, licensed, and released. Downstream governance asks how the model is used in a specific product or institution.

The EU AI Act addresses this by imposing obligations on providers of general-purpose AI models, including documentation, information for downstream providers, copyright-policy duties, and summaries of training content. Models with systemic risk face additional expectations around evaluation, risk assessment, incident reporting, cybersecurity, and safety mitigations.

Safety must be evaluated at several layers. Base-model evaluations can test broad capabilities, memorization, bias, unsafe assistance, cyber or biosecurity misuse, persuasion, privacy leakage, robustness, and model-weight security. System evaluations must test the actual deployed stack: prompts, tools, retrieval sources, permissions, monitoring, user interface, human review, and the population affected. A model that is acceptable in one context can be unacceptable in another.

Documentation should include model cards or system cards, training-data summaries, evaluation methodology, known limitations, post-training methods, release restrictions, incident processes, security assumptions, and downstream integration guidance. Documentation does not solve governance, but without it downstream users and regulators are forced to govern an infrastructure layer they cannot see.

Release governance should distinguish closed API access, gated access, research access, open-weight release, and full open-source-style release. Open weights can support accountability, competition, and local control, but they can also make safety mitigations easier to remove and recall difficult. Closed systems can preserve more centralized control, but they can also hide evidence, concentrate power, and make independent evaluation harder.

Downstream governance should require an AI system inventory, procurement record, model or system card, data-processing terms, evaluation report, incident channel, human-oversight design, and update policy before high-impact use. Foundation-model governance fails when the base model is documented but the deployment wrapper, retrieval corpus, tool permissions, logging, and affected-person recourse are not.

Minimum Foundation-Model Record

A foundation-model record should make it possible to track a capability from pretraining through post-training, release, integration, and deployment. The record can have public, customer, regulator, and security-restricted layers, but some accountable record should exist for each material claim.

This minimum record connects foundation models to model and system cards, AI system inventories, data provenance, AI bills of materials, evaluations, and audit trails. Without this connective tissue, responsibility gets lost between the upstream provider and the downstream deployer.

Risk Pattern

Inherited harm. Bias, toxicity, memorization, copyright exposure, privacy leakage, unsafe capabilities, and benchmark contamination can travel from the base model into many downstream systems.

Capability overhang. A base model may contain abilities that are not obvious until new prompting, tools, fine-tuning, retrieval, or scaffolding unlocks them.

Opacity. Developers often disclose limited information about training data, labor pipelines, compute, model architecture, evaluations, and post-training changes, especially for commercial frontier models.

Centralization. Training frontier foundation models requires large amounts of data, compute, capital, engineering, energy, and distribution. This can concentrate power in a small number of labs and cloud platforms.

Downstream mismatch. A model trained for broad usefulness may be embedded in contexts with legal, medical, educational, financial, military, or emotional stakes that its general training did not adequately cover.

Release-route mismatch. A model released for research can become a production dependency; an open-weight checkpoint can be fine-tuned into a higher-risk system; a hosted API can change behavior without downstream users updating their documentation.

Supply-chain fragility. Foundation-model systems depend on datasets, weights, tokenizers, adapters, guard models, model routers, inference providers, vector stores, evaluation harnesses, and monitoring tools. A failure or compromise in one layer can change the deployed system.

False neutrality. Because foundation models are general-purpose, providers may present them as neutral infrastructure. But choices about data, filtering, refusal behavior, licensing, availability, and deployment defaults are political and institutional choices.

Evaluation gap. Pre-deployment benchmarks and red-team tests can miss real-world risks after a model is wrapped in tools, retrieval, memory, user-specific data, or agent loops. The 2026 International AI Safety Report frames this as a technical and institutional challenge for general-purpose AI risk management.

Source Discipline

Claims about foundation models should identify the exact source type. A research paper, model card, system card, benchmark report, official product announcement, regulator guidance, transparency index, license file, and independent audit each support different claims.

For model facts, name the model version, release date, modality, access route, and whether the claim concerns a base model, instruction-tuned model, open-weight checkpoint, hosted API, product wrapper, or agent scaffold. For benchmark claims, name the evaluation harness, prompting or tool setup, date, and whether the result includes retrieval, code execution, sampling, or other scaffolding.

For governance claims, prefer primary sources: the AI Act text, European Commission GPAI guidance, NIST publications, official safety frameworks, standards-body documents, published model/system cards, and documented incident reports. Treat company transparency reports and model cards as provider claims unless independently audited or backed by reproducible evidence.

Do not convert one evidentiary status into another. A Stanford transparency score describes public disclosure, not model safety. A NIST profile or draft guidance describes risk-management practice, not legal compliance by itself. A model card describes what the provider reports. A regulator filing, audit, or incident record answers a different question.

When legal terms are at issue, quote the legal instrument or official guidance rather than translating loosely from technical vocabulary. "Foundation model," "general-purpose AI model," "general-purpose AI model with systemic risk," "open-weight model," and "open-source AI" are related but not interchangeable.

Avoid turning "foundation model" into a loose synonym for "AI." A small task-specific classifier, a rules-based product, a fine-tuned downstream model, and a full deployed AI service may all be AI, but they raise different evidence and accountability questions.

Spiralist Reading

Foundation models are the reusable substrate beneath many Mirror interfaces.

They take large collections of data, labor, computation, and institutional choice, then return them as reusable capability. The user sees an assistant, search box, coding agent, tutor, companion, or creative tool. Beneath that surface is a model whose training and post-training decisions may be only partially visible.

For Spiralism, the central danger is foundation without accountability. A model can become a hidden public utility while remaining privately governed, partially documented, and optimized for incentives the public cannot inspect. The foundation then shapes attention, work, memory, knowledge, and authority while presenting itself as a neutral service layer.

The constructive task is not to reject foundation models. It is to demand source discipline around them: provenance, disclosure, evaluation, appeal, public-interest alternatives, strong security, downstream accountability, and the preservation of cognitive sovereignty for people who live on top of these systems.

Open Questions

Sources


Return to Wiki