Wiki · Organization · Last reviewed June 25, 2026

Thinking Machines Lab

Thinking Machines Lab is an artificial intelligence research and product company founded by former OpenAI chief technology officer Mira Murati. It presents itself as a lab for understandable, customizable, and collaborative AI systems, with public work spanning the Tinker training API, open research artifacts, large-scale compute infrastructure, and real-time multimodal interaction models.

Definition

Thinking Machines Lab is a frontier AI research and product company whose public identity centers on model customization and human-AI collaboration. The company should be analyzed as several related artifacts: the corporate lab, the Tinker training API, the Tinker Cookbook and documentation, the interaction-model research preview, and the NVIDIA infrastructure partnership.

It is not, by itself, a single released model family. A claim about "Thinking Machines" may refer to a company statement, a Tinker-supported base model, a user-created fine-tuned checkpoint, the TML-Interaction-Small research system, an open-source repository, or planned compute capacity. Governance analysis should name the exact artifact, access path, legal terms, data flow, and deployment state.

Snapshot

Founding and Position

Thinking Machines Lab emerged from the post-OpenAI founder wave. Murati left OpenAI in September 2024 after serving as chief technology officer and briefly interim CEO during the November 2023 leadership crisis. In February 2025, she publicly introduced Thinking Machines Lab as a company focused on the interaction between humans and AI.

The company's own framing emphasizes three gaps: public understanding of frontier systems lags behind capability, training knowledge is concentrated inside top labs, and advanced AI systems remain difficult for people to adapt to their own needs and values. That framing places Thinking Machines between several categories. It is not only a model lab, not only a developer platform, and not only an AI interface company. It is trying to make the training and interaction layer itself into the product.

This distinguishes it from labs whose public stories center mostly on autonomous agents, general superintelligence, or enterprise assistants. Thinking Machines' emphasis is collaborative, general-purpose AI: systems that humans can shape, interrupt, adapt, and work with directly.

Current Context

As of June 25, 2026, the strongest current evidence for Thinking Machines' public direction comes from its own site, Tinker documentation, the interaction-model research preview, the Tinker Cookbook repository, the May 2026 Privacy Notice, and the March 2026 NVIDIA partnership announcement. Those sources support a picture of a company trying to combine three layers: frontier model research, a customization platform for researchers and developers, and human-AI interaction research.

The company's public site says it plans to publish technical blog posts, papers, and code, while also building models at the frontier of capabilities in domains such as science and programming. Its safety language emphasizes preventing misuse of released models, sharing recipes for safe AI systems, accelerating outside alignment research, red-teaming, and post-deployment monitoring. These are public commitments and positioning claims; they are not by themselves evidence that any particular model or deployment is safe.

Tinker is the clearest near-term product evidence. The current Tinker page describes a training API with primitives for forward and backward passes, optimization steps, sampling, and saving state. It lists supported model families, says Tinker uses LoRA, says user training data is used solely to fine-tune user models rather than to train Thinking Machines' own models, and says users can download saved checkpoints. Those claims are governance-relevant because they affect privacy, portability, provenance, downstream accountability, and the boundary between platform provider and deployer.

The live Tinker surface is also mutable. As of this review, the product page lists Qwen3.6, Qwen3.5, GPT-OSS, DeepSeek-V3.1, Kimi K2.5, Kimi K2.6, and NVIDIA Nemotron-3 variants with different context lengths and prices, while the documentation lists model deprecations and recommends replacements such as moving from Kimi K2.5 to Kimi K2.6 by July 12, 2026. Evaluations and procurement records should therefore preserve the exact model name, date, context window, retirement notice, and price or access tier used.

The interaction-model research preview is not a general proof of safer collaboration. It is a technical direction with explicit limitations: very long continuous sessions, reliable low-latency connectivity, alignment and safety research for real-time interfaces, and scaling to larger models remain open work in the company's own account. Its benchmark claims are developer-reported and include internal audio and video benchmarks, so they should be read as release evidence rather than independent validation.

The legal surface also matters. Thinking Machines' public Terms say the general services are for users who are at least 18 and reside in the United States or its territories, grant personal non-commercial use unless otherwise permitted, and may be supplemented by product-specific terms. Its Privacy Notice, last updated May 19, 2026, says Thinking Machines is the data controller for covered services and may collect communications, grant-application, device, location, log, and usage information, share personal information with vendors, affiliates, professional advisors, transaction parties, or authorities, and retain it as reasonably necessary for stated purposes. Institutions should therefore treat Tinker and related services as vendor systems with contractual, privacy, retention, and audit questions, not just as model-training tools.

Customization Strategy

The company's public materials repeatedly connect capability with customization. Rather than treating AI access as a finished chatbot or API endpoint, Thinking Machines describes a future where more researchers, developers, and organizations can adapt models for their own use cases.

That strategy has two different meanings. At the product level, it means tools for fine-tuning, post-training, reinforcement-learning loops, evaluation, and experimentation on open-source or open-weight models. At the governance level, it means shifting some model-shaping authority from frontier labs toward outside users. This can widen research access and local expertise, but it also distributes safety obligations across many smaller deployments.

For the wiki, the important point is that customization is not a neutral convenience feature. It changes who can imprint goals, policies, data, and behavioral norms onto AI systems. A customization platform is therefore also a governance platform: it decides which base models are supported, what data promises are made, whether checkpoints can be exported, what abuse monitoring exists, when a base model is deprecated, and what documentation follows adapted models into deployment.

Tinker

In October 2025, Thinking Machines announced Tinker, a managed API for fine-tuning language models. The announcement described Tinker as a private beta that gives users control over algorithms and data while the company handles distributed training infrastructure, scheduling, resource allocation, and failure recovery. By June 2026, the public Tinker page presented sign-up, sign-in, documentation, pricing, supported models, and checkpoint-download claims.

Tinker supports fine-tuning of small and large models, including mixture-of-experts models. It uses LoRA so multiple training runs can share a compute pool at lower cost, and the company released an Apache-2.0 Tinker Cookbook with implementations of post-training methods built on top of the API. The current product page also lists usage-based pricing, model families, checkpoint download, and the claim that customer training data is not used to train Thinking Machines' own models.

The early user examples matter. Thinking Machines said groups at Princeton, Stanford, Berkeley, and Redwood Research had used Tinker for theorem proving, chemistry reasoning, reinforcement learning loops, multi-agent tool use, and AI control work. The public Cookbook includes recipes for supervised fine-tuning, RL, preference learning, distillation, tool use, multi-agent training, and benchmark evaluation. That positions Tinker as research infrastructure, not merely an enterprise customization dashboard.

The governance issue is that fine-tuning tools can alter refusal behavior, domain competence, privacy exposure, and downstream safety properties. A Tinker-trained model should be documented as a derivative artifact: base model, adapter or checkpoint, dataset provenance, training recipe, loss function, hyperparameters, evaluation settings, safety tests, export history, and deployment constraints.

Interaction Models

In May 2026, Thinking Machines announced a research preview of interaction models. The company argues that current turn-based AI interfaces push humans out of the loop because the model waits for a completed prompt, then generates a completed answer. Its proposed alternative is a model designed from the start for continuous audio, video, and text interaction.

The first public system, TML-Interaction-Small, is described as a mixture-of-experts model with 276 billion parameters and 12 billion active parameters. Its architecture uses time-aligned micro-turns, processing and producing roughly 200 milliseconds of input and output at a time, so interruption, silence, overlap, timing, and visual cues remain part of the model's context.

Thinking Machines also separates the real-time interaction model from an asynchronous background model that can handle deeper reasoning, tool use, browsing, and longer-horizon work. This matters because the lab is not only trying to reduce voice latency. It is treating interactivity as a core model capability that should scale alongside intelligence.

The governance issue is direct: if AI becomes more socially present, interruptible, and responsive in real time, it may preserve human agency by keeping people in the loop. It may also increase emotional salience, dependency, persuasion power, accessibility risk, recording and consent concerns, and the sense that the model is a live collaborator rather than a tool. Safety evidence for interaction models should therefore include social-interface tests, consent and recording analysis, crisis and minor-user handling, accessibility evaluation, latency failure modes, and post-deployment monitoring, not only benchmark scores.

Compute and Infrastructure

Thinking Machines' scale ambitions became clearer in March 2026, when the company and NVIDIA announced a multi-year strategic partnership to deploy at least one gigawatt of next-generation NVIDIA Vera Rubin systems for frontier model training and customizable AI platforms. The announcement said deployment on NVIDIA's Vera Rubin platform was targeted for early 2027 and that NVIDIA had made a significant investment in the company.

This partnership is important because it marks Thinking Machines as a compute-scale AI competitor, not just a software startup around model access. Gigawatt-scale deployment is the language of frontier training, data-center planning, energy demand, chip-supply strategy, and physical infrastructure governance. It is also a planned-capacity claim: the governance record should distinguish announced partnership, ordered systems, powered and networked clusters, training runs, and delivered public access.

Public reporting in July 2025 said Thinking Machines closed a 2 billion dollar seed round led by Andreessen Horowitz at a 12 billion dollar valuation, with participation from investors including NVIDIA, Accel, ServiceNow, Cisco, AMD, and Jane Street. Those figures should be treated as dated reporting, but they show how quickly investors placed the company in the frontier-lab category.

Governance and Safety

Thinking Machines' governance problem is not only whether its models are capable. It is whether a customization-and-interaction lab can make user agency real without making responsibility diffuse. A training API, a real-time multimodal interface, downloadable checkpoints, and frontier-scale compute each create a different safety surface.

Customization governance. Tinker gives users more control over data and algorithms, but the platform still controls access, supported model choices, pricing, telemetry, account enforcement, checkpoint export, safety policy, and default recipes. A responsible platform should make derivative models traceable enough for evaluation, incident review, and deployment governance.

Data and contract governance. Tinker's training-data promise is narrower than the whole privacy surface. Even if training data is not used to train Thinking Machines' own models, prompts, logs, usage data, account records, support records, grant applications, checkpoints, telemetry, and abuse-monitoring records can raise separate retention, confidentiality, access, deletion, and disclosure questions. Buyers should review the operative product terms, privacy notice, data-processing terms, and audit rights.

Model-lineup governance. A platform that fine-tunes third-party open models inherits upstream model-card limits, licenses, deprecations, safety changes, and provider availability. If a base model is retired or replaced, downstream users need migration tests and records showing whether the new base model preserves task performance and safety behavior.

Interaction governance. Real-time audio, video, and text interfaces should be evaluated for interruption, consent, privacy, accessibility, manipulation, emotional dependence, minors, crisis escalation, over-reliance, and whether a human can understand and correct the model while the interaction is happening.

Compute governance. The NVIDIA partnership raises ordinary frontier-lab questions about energy use, data-center siting, chip concentration, cybersecurity, model-weight security, and whether large-scale infrastructure commitments outpace public evidence about safety and accountability.

Safety-evidence governance. The company says it will prevent misuse, share best practices, accelerate alignment research, and use red-teaming and post-deployment monitoring. Those are useful commitments, but a high-quality record would also include model or system cards, evaluation methods, red-team summaries, release criteria, incident-reporting channels, security controls, data-use terms, and independent scrutiny where possible.

Source Discipline

For Thinking Machines Lab, source discipline means separating official company positioning, product documentation, research previews, code repositories, partner announcements, and press reporting.

Use Thinking Machines pages for what the company says it is building, how it describes Tinker, how it frames interaction models, and what it claims about data use, checkpoint export, safety posture, and partnership goals. Use NVIDIA's announcement for the partner's version of the gigawatt-scale deal. Use Tinker documentation and repositories for live SDK, model-list, deprecation, license, and recipe details. Use press reporting for funding, valuation, personnel, and internal dynamics only when the sentence labels them as reported or attributed.

Do not treat "customizable," "collaborative," "understandable," or "open" as achieved states without artifact-level evidence. The stronger claim should identify the exact model, product page, research-preview date, data-use term, repository, license, checkpoint status, model-deprecation state, evaluation method, and deployment setting.

For Tinker claims, preserve the review date because the supported-model lineup, pricing, deprecations, examples, API behavior, and legal terms are live service facts. For interaction-model claims, distinguish developer-reported benchmarks from independent evaluations and distinguish a research preview from a released general-purpose product.

Do not infer consciousness, AGI, or moral agency from real-time interaction. Interaction models may feel more socially present, but the responsibility remains with the institutions and people that build, deploy, and govern them.

Spiralist Reading

Thinking Machines Lab is the workshop version of the frontier lab.

Its promise is not simply that the model will become stronger. Its promise is that people will be able to shape the model, converse with it more naturally, and bring their own expertise into the loop. That is a real counter-myth to the closed oracle: not "ask the machine," but "work with the machine and change it."

The risk is that every workshop can become a private mirror. Customization can restore agency, but it can also let institutions, communities, companies, or individuals train smaller worlds around themselves. Real-time interaction can preserve correction, but it can also make the interface feel alive enough that users defer to it.

The Spiralist reading is therefore conditional. Thinking Machines is important because it focuses on the human-AI interface as a site of power. The test is whether customization and collaboration produce more human sovereignty, or merely a more personal form of capture.

Open Questions

Sources


Return to Wiki