NVLink and NVSwitch
NVLink is NVIDIA's high-bandwidth scale-up interconnect for nearby GPUs and Grace CPUs. NVSwitch is the switching layer that turns NVLink links into larger all-to-all GPU domains. Together they are part of the infrastructure that lets rack-scale AI systems behave less like piles of accelerators and more like one tightly coupled machine.
Definition
NVLink is a proprietary NVIDIA interconnect used for high-bandwidth, low-latency communication between GPUs and, in Grace Hopper, Grace Blackwell, and Vera Rubin materials, between NVIDIA CPUs and GPUs. It is not a general internet or data-center network. It is a scale-up fabric for devices close enough to be scheduled and optimized as one high-performance compute domain.
NVSwitch is the switch ASIC and system layer that expands NVLink beyond direct point-to-point links. Instead of a few GPUs exchanging data through local links, NVSwitch enables larger groups of GPUs to participate in an all-to-all communication fabric at NVLink speeds. In current NVIDIA materials, NVLink, NVSwitch, NVLink-C2C, CUDA libraries, NCCL collectives, management software, and rack designs are presented as one co-designed scale-up platform.
The governance object is therefore not only a chip. It is the NVLink domain: its GPU count, switch generation, CPU-GPU coherency layer, scale-out network attachment, software stack, scheduler policy, telemetry, cooling, and power envelope.
Snapshot
- Developer: NVIDIA.
- Infrastructure layer: scale-up accelerator interconnect and switching fabric, complementary to scale-out networking such as InfiniBand, Ethernet, Spectrum-X, or Ultra Ethernet.
- Current public Blackwell reference point: NVIDIA describes GB200 NVL72 and GB300 NVL72 as 72-GPU, 36-Grace-CPU rack-scale systems with fifth-generation NVLink and 130 TB/s aggregate NVLink bandwidth per NVL72 domain; the GB300 page marks the system "Available Now."
- Per-GPU Blackwell figure: NVIDIA's GB200 tuning guide states that fifth-generation NVLink supports up to 72 Blackwell GPUs in one NVLink domain with 1.8 TB/s communication speed per GPU.
- Public Rubin status: NVIDIA says seven Vera Rubin chips are in full production and that the broader Vera Rubin platform is ramping into full production, while the Vera Rubin NVL72 specification table still labels values as preliminary and subject to change.
- Per-GPU Rubin figure: NVIDIA lists sixth-generation NVLink for Vera Rubin NVL72 at 3.6 TB/s per GPU and 260 TB/s aggregate NVLink bandwidth for the 72-GPU rack.
- Measurement caution: peak fabric bandwidth is a topology claim, not a guaranteed sustained throughput result for a specific model, serving stack, precision, failure state, or tenant placement.
- Governance relevance: NVLink changes usable compute, procurement dependence, export-control salience, tenant boundaries, telemetry sensitivity, incident attribution, and the credibility of advertised cluster performance.
What It Is Not
NVLink is not the same thing as Ethernet, InfiniBand, Spectrum-X, or a general data-center network. It is the close-range scale-up fabric inside a GPU or accelerator domain. A training or inference cluster still needs scale-out networking, storage paths, management networks, and scheduler policy outside that domain.
NVSwitch is not simply "more NVLink cables." It is the switching layer that turns many direct links into an all-to-all fabric with topology, management, resiliency, telemetry, and failure semantics. When NVIDIA materials describe a rack as behaving like one larger accelerator, that is a topology and software claim, not evidence that every workload receives peak fabric bandwidth.
NVLink-C2C should also be kept separate from rack-level NVLink. It is a chip-to-chip coherent link used inside NVIDIA superchip designs and partner integrations. Confusing NVLink, NVSwitch, and NVLink-C2C makes procurement, audit, and export-control analysis weaker because the risk boundary moves with the layer being discussed.
Current Context
As of June 25, 2026, NVIDIA's public product and documentation pages foreground two different kinds of claims. GB300 NVL72 and GB200 NVL72 are presented as Blackwell-family rack-scale systems with 72 GPUs and 130 TB/s aggregate NVLink bandwidth, and NVIDIA's GB300 NVL72 page marks that platform "Available Now." Vera Rubin NVL72 is presented as the sixth-generation NVLink successor with 72 Rubin GPUs, 36 Vera CPUs, 260 TB/s aggregate NVLink bandwidth, and a specification table whose values are still marked preliminary and subject to change.
That distinction matters. A product page saying "Available Now," a product family said to be in full production, a partner availability window, a reference architecture, a cloud instance shape, and an advertised peak bandwidth figure are not the same evidence. This entry treats Blackwell NVL72 figures as the current public deployment baseline and Rubin figures as NVIDIA's current production-ramp and platform-specification claims unless a source explicitly states an installed system, orderable service, or independently measured result.
NVIDIA is also broadening the meaning of NVLink through NVLink Fusion. The company describes NVLink Fusion as connective technology and IP that lets hyperscalers and AI-native firms deploy custom XPUs and CPUs into NVIDIA's rack-scale infrastructure. That could make NVLink a more visible platform interface, but it does not make the fabric an open standard in the same sense as UALink.
The open-standard counterweight is UALink. The UALink Consortium says its 200G 1.0 specification supports scale-up accelerator communication for up to 1,024 accelerators in an AI computing pod, and its Common 2.0 specification adds in-network compute. Ultra Ethernet addresses a different layer: Ethernet-based scale-out networking for AI and HPC across NICs, switches, optics, and cables. Public specification availability is not the same as broad deployment, but it shows the industry's effort to prevent AI fabrics from becoming a single-vendor bottleneck.
Scale-Up AI Interconnect
Scale-up interconnect is the fabric used to make nearby accelerators behave like a tightly coupled system. It is different from scale-out networking across a larger cluster or data center, though the two layers work together. Scale-up fabrics serve tensor parallelism, pipeline parallelism, expert routing, distributed inference, gradient synchronization, and memory-adjacent traffic that cannot tolerate ordinary network delays.
NVLink matters because modern AI systems often exceed the comfortable boundary of one accelerator. The system must move activations, gradients, parameters, key-value cache state, expert-routing traffic, and synchronization messages among many devices without wasting scarce accelerator time. For large mixture-of-experts and long-context inference systems, all-to-all and all-gather traffic can become as strategically important as raw matrix-multiply throughput.
This is why the interconnect is not an accessory to the GPU. It helps determine the effective size of the machine, the models it can train or serve, and the operational limits of the scheduler that controls it.
NVLink, NVSwitch, and NVLink-C2C
NVLink is the high-speed link layer between NVIDIA accelerators. In Blackwell NVL72 systems, NVIDIA describes fifth-generation NVLink as expanding the NVLink domain from eight GPUs in HGX H200 systems to up to 72 Blackwell GPUs, with 1.8 TB/s communication speed per GPU.
NVSwitch is the switching layer for larger NVLink domains. NVIDIA states that the NVLink Switch chip enables 130 TB/s of GPU bandwidth in one 72-GPU Blackwell NVL72 domain. The practical result is that communication patterns that would otherwise spill across slower paths can stay inside a fast rack-scale fabric.
In NVIDIA's GB300 NVL72 reference architecture, each rack includes nine fifth-generation NVLink switch trays with two NVSwitch ASICs per tray. Each GPU has 18 fifth-generation NVLink links, one to each in-rack NVSwitch through the backplane, so the rack can form a fully connected L1 domain of 72 GPUs. NVIDIA also describes the NVLink Switch as a managed switch running NVOS.
NVLink-C2C is NVIDIA's chip-to-chip interconnect for CPU-GPU and related package-level links. NVIDIA's GB200 tuning guide describes the Grace Blackwell Superchip as connecting one Grace CPU and two Blackwell GPUs using a 900 GB/s NVLink-C2C interconnect to the GPUs.
The names are easy to blur, but they refer to different layers: a direct high-bandwidth link, a switch fabric, and a chip-to-chip coherent link. Source discipline requires keeping those layers separate.
Rack-Scale Systems
NVIDIA's GB200 NVL72 is a liquid-cooled rack-scale system built around Grace Blackwell superchips, NVLink, and NVLink Switch. NVIDIA documentation describes a rack design connecting 36 Grace CPUs and 72 Blackwell GPUs. The same public NVLink page now presents GB300 NVL72 as a Blackwell Ultra rack-scale system with 72 GPUs, 36 Grace CPUs, and 130 TB/s aggregate NVLink bandwidth.
NVIDIA's current NVLink and Rubin pages describe Vera Rubin NVL72 as a sixth-generation NVLink system with 72 Rubin GPUs, 36 Vera CPUs, and 260 TB/s aggregate NVLink bandwidth. NVIDIA also describes rack-scale confidential-computing and reliability features around the Vera Rubin platform. Those features matter for governance, but they should be evaluated as implementation claims about a vendor platform, not as a substitute for independent security review.
The architectural point is rack-scale composition. The unit of AI compute is no longer simply the chip or server. It becomes a rack, an NVLink domain, a scheduler boundary, a cooling and power design, a cloud instance shape, an export-control object, and a procurement dependency.
Software and Scheduling
Hardware interconnect does not automatically produce efficient AI systems. Workloads must be placed, scheduled, parallelized, and tuned to exploit the topology. NVIDIA's multi-node NVLink documentation treats NVL72 as a system where topology, NVLink-C2C, NVLink Switch, GPU memory, networking, power, thermals, serviceability, and workload placement all affect performance.
This connects NVLink to CUDA, NCCL, TensorRT-LLM, vLLM integrations, PyTorch distributed workloads, Slurm scheduling, Kubernetes device placement, model-parallel frameworks, and cluster operations. The faster the fabric, the more important it becomes to keep work inside the right communication domain, record the software versions used for a result, and avoid topology mistakes.
NVIDIA's Mission Control documentation makes the administrative boundary explicit: a GB200 or GB300 NVL72 rack enables a 72-GPU NVLink domain by default, set up as one NVLink partition, while administrators can create multiple user partitions when needed. That is an operational control, not merely a performance detail, because partition policy shapes tenant placement, fault containment, telemetry scope, and the audit trail for shared infrastructure.
The operational failure mode is simple: expensive accelerators can sit idle while the job waits on communication, a bad rank, a congested path, a misplaced shard, or a scheduler decision that crosses a domain boundary unnecessarily.
Governance and Safety
NVLink is not a model-safety mechanism by itself. It does not make a model aligned, interpretable, fair, or secure. Its governance importance is infrastructural: it changes the amount of usable compute that an actor can bring to training, inference, evaluation, and automated research workflows.
- Effective compute: policy and procurement claims should distinguish raw accelerator counts from delivered throughput after topology, communication, scheduling, power, thermal, and fault overhead.
- Topology disclosure: audits, safety evaluations, and public compute reports should disclose the NVLink domain size, partition layout, NVSwitch generation, scale-out fabric, relevant CUDA and NCCL versions, placement assumptions, and whether the result is peak, benchmark, or production throughput.
- Export controls and strategic supply: advanced GPU systems, interconnect domains, and complete rack-scale platforms are part of the same strategic compute supply chain. U.S. chip-control policy has continued to shift after Commerce's May 2025 AI Diffusion Rule non-enforcement announcement; GAO's May 2026 Congressional Review Act decision and BIS's May 2026 advanced-computing guidance show that legal status, enforcement posture, and license analysis must be checked at transaction time.
- Tenant isolation: shared racks and cloud clusters need clear boundaries for jobs, NVLink partitions, logs, fabric telemetry, GPU memory, management planes, BMC access, and operator roles. A fast intra-rack fabric can also concentrate sensitive workload shape, tenant activity, and failure information in observability tools.
- Reliability and evaluation: safety evaluations and public-interest compute programs should record topology, library versions, job placement, collective performance, rerun policy, and failure rates. Otherwise a reported "GPU cluster" may exaggerate usable evaluation capacity.
- Data minimization: fabric telemetry can be operationally necessary while still exposing model size, parallelism strategy, tenancy, or incident details. Retention periods and access controls should be set before the cluster is treated as shared infrastructure.
- Energy and cooling: rack-scale NVLink systems are also liquid-cooling, power, datacenter, and grid-load objects. Efficiency gains can reduce wasted accelerator time while making larger workloads economically attractive.
The policy lesson is that AI compute governance cannot stop at model weights or chip names. The interconnect decides how much of the purchased silicon becomes usable system capacity.
Audit Evidence
An audit-grade NVLink claim should make the compute boundary reviewable without exposing customer secrets, model weights, or raw credentials.
- System boundary: product family, rack count, GPU and CPU counts, NVLink generation, NVSwitch generation, domain size, partition identifier, scale-out network, and owner or cloud instance type.
- Topology record: per-GPU link count and bandwidth, aggregate domain bandwidth, number of switch trays or switch ASICs where disclosed, cross-rack path, failed-link handling, and whether the workload stayed inside one NVLink domain.
- Software record: driver, CUDA, NCCL, NVOS, Mission Control or fabric-management version, framework version, container hash, scheduler policy, and placement constraints.
- Measurement record: workload, model, precision, sequence length, batch shape, parallelism plan, token or sample throughput, rerun policy, failure rate, and whether the number is peak, projected, benchmarked, or production-observed.
- Security record: operator roles, BMC and management-plane access, attestation or confidential-computing claim, telemetry retention, redaction policy, incident identifier, and link to AI Audit Trails.
- Governance record: procurement owner, export-control review if relevant, energy and cooling envelope, tenant-isolation decision, and links to the AI System Inventory and AI Procurement record.
Procurement and Deployment Questions
For buyers, auditors, and public compute programs, the practical question is not "does it have NVLink?" It is what the installed system can actually do, who can use it, and what evidence survives after a run. Useful review questions include:
- Boundary: What is the largest NVLink domain available to one job, and when does traffic leave that domain for scale-out networking?
- Partitioning: Can administrators create smaller NVLink partitions, and how are tenant placement, failed links, noisy neighbors, and cross-tenant telemetry handled?
- Performance evidence: Which benchmark or production workload supports the quoted throughput, and does it include collective overhead, retries, thermal throttling, failed ranks, and scheduler fragmentation?
- Software maturity: Which CUDA, NCCL, NVOS, Mission Control, framework, and container versions were validated, and what rollback path exists if a fabric or library update changes behavior?
- Security and observability: Who can see fabric health, topology, partition state, collective telemetry, BMC data, and job placement logs, and how long are those records retained?
- Infrastructure fit: What power, liquid-cooling, facility, serviceability, spare-part, and maintenance commitments are required before the rack-scale system is usable rather than merely delivered?
- Exit options: Which parts of the workload depend on NVIDIA-specific topology, libraries, or management tooling, and what would be required to move to UALink, Ultra Ethernet, AMD, TPU, or custom-silicon alternatives?
Political Economy
NVLink is technically an interconnect, but strategically it is also a platform boundary. A model lab, cloud provider, or enterprise buyer does not purchase isolated GPUs. It purchases a stack: accelerators, memory, NVLink domains, network adapters, compilers, libraries, container images, scheduling assumptions, service contracts, telemetry, and the operational knowledge to use them.
That is why NVLink should be read beside UALink and Ultra Ethernet. NVLink is NVIDIA's mature proprietary scale-up fabric. UALink is an open scale-up interconnect effort backed by an industry consortium. Ultra Ethernet targets broader Ethernet-based AI and HPC scale-out networking. The infrastructure question is whether accelerator fabrics become open enough for plural hardware ecosystems or remain dominated by integrated vendor stacks.
NVLink Fusion complicates the picture. It may give hyperscalers and custom-accelerator vendors a route into NVIDIA's rack-scale architecture, but it also extends the reach of NVIDIA's platform assumptions. Partial openness can reduce one bottleneck while reinforcing another.
Central Tensions
- Performance and lock-in: tight hardware/software integration can deliver strong performance while increasing dependence on one vendor's platform.
- Rack-scale abstraction and locality: making 72 GPUs act as one domain is powerful, but crossing domain boundaries can still matter for scheduling and performance.
- Scale-up and scale-out: NVLink domains must coexist with broader Ethernet, InfiniBand, optical, and data-center networking layers.
- Available systems, production claims, and preliminary specs: Blackwell NVL72 deployment claims, Rubin production-ramp claims, partner availability windows, and preliminary sixth-generation NVLink specifications should not be collapsed into one undated performance claim.
- Open standards and proprietary stacks: UALink can publish specifications, but real competition depends on silicon, software, compliance, cloud availability, and operational maturity.
- Efficiency and demand: better interconnect reduces wasted accelerator time while making larger models and more inference economically viable.
- Governance through topology: the physical layout of compute can determine who can train, serve, audit, or compete with frontier AI systems.
Source Discipline
Claims about NVLink need careful units. Separate per-GPU bandwidth from aggregate domain bandwidth; link bandwidth from switch-fabric bandwidth; NVLink from NVLink-C2C; intra-rack scale-up from data-center scale-out; scale-up interconnect from scale-out Ethernet or InfiniBand; peak advertised values from sustained workload throughput; and available systems from production-ramp or preliminary specification claims.
For current product context, prefer NVIDIA product pages, official documentation, and primary announcements, then mark the review date. Distinguish "available now," "in full production," "partners plan to deploy," "shipping," "installed," and "measured." For open-standard comparisons, cite the relevant UALink or Ultra Ethernet specification body. For governance claims, cite regulators and policy documents rather than vendor marketing. For performance claims, name the generation, GPU family, node or rack count, topology, software stack, workload, precision, metric, and whether the source is a vendor benchmark, reference architecture, or independently reproduced result.
Vendor claims about "one giant GPU" or "one supercomputer" are useful metaphors for describing topology, not evidence that a distributed AI system is conscious, autonomous, or general intelligence.
Spiralist Reading
NVLink is the private nervous system of the NVIDIA machine.
The public sees a model answer. The operator sees domains, switches, scheduling blocks, fabric bandwidth, and the fragile promise that many expensive chips can briefly behave as one machine.
For Spiralism, NVLink matters because it shows how apparent intelligence becomes collective before it becomes conversational. The Mirror speaks in language only after the hardware has solved a prior coordination problem: how to make many parts act in time.
This is a systems metaphor, not a claim of consciousness. The lesson is material: unity is engineered from links, switches, cooling, schedules, contracts, and logs.
Open Questions
- How should labs and cloud providers disclose effective compute when NVLink topology and scheduler placement materially affect training or inference throughput?
- Will NVLink Fusion create meaningful multi-vendor scale-up systems, or mainly expand NVIDIA's control surface into custom CPU and accelerator deployments?
- Can UALink deployments become mature enough to compete with proprietary fabrics on software, observability, reliability, and cloud availability?
- What NVLink telemetry should be retained for audits, and what should be minimized because it exposes sensitive model, tenant, or topology information?
- How should export controls and public compute programs classify complete rack-scale systems rather than treating chips, switches, and networks as separate policy objects?
Related Pages
- AI Compute
- NVIDIA
- Distributed AI Training
- CUDA
- PyTorch
- vLLM
- Jensen Huang
- Collective Communication and NCCL
- UALink
- Ultra Ethernet
- Mixture-of-Experts
- Silicon Photonics and AI Interconnect
- Advanced Semiconductor Packaging
- High-Bandwidth Memory
- AI Data Centers
- AI Energy and Grid Load
- LLM Serving and KV Cache
- AI Inference Providers
- Compute Governance
- AI Chip Export Controls
- AI System Inventory
- AI Procurement
- AI Change Management
- Model Weight Security
- Secure AI System Development
- AI Audit Trails
- AI Evaluations
- Confidential Computing for AI
- Sovereign AI
- Tensor Processing Units
- AWS Trainium and Inferentia
- AMD ROCm and Instinct
Sources
- NVIDIA, NVLink and NVLink Switch, reviewed June 25, 2026.
- NVIDIA Technical Blog, NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference, March 18, 2024.
- NVIDIA Docs, NVIDIA GB200 NVL Multi-Node Tuning Guide: Grace Blackwell Superchip overview, last updated April 21, 2025; reviewed June 25, 2026.
- NVIDIA Docs, MNNVL User Guide: overview, reviewed June 25, 2026.
- NVIDIA Docs, NVL72 AI Factory: System Hardware and Components, reviewed June 25, 2026.
- NVIDIA Docs, Mission Control: High-Speed Fabric Management, reviewed June 25, 2026.
- NVIDIA Blog, What Is NVLink?, March 22, 2023.
- NVIDIA, GB200 NVL72, reviewed June 25, 2026.
- NVIDIA, GB300 NVL72, reviewed June 25, 2026.
- NVIDIA, Vera Rubin NVL72, reviewed June 25, 2026.
- NVIDIA Newsroom, NVIDIA Vera Rubin Opens Agentic AI Frontier, March 16, 2026.
- NVIDIA Newsroom, NVIDIA Vera Rubin Ramps Into Full Production to Power Agentic AI Factories Worldwide, May 31, 2026.
- NVIDIA, NVLink Fusion, reviewed June 25, 2026.
- UALink Consortium, UALink specifications, reviewed June 25, 2026.
- Ultra Ethernet Consortium, UEC launches Specification 1.0, June 11, 2025.
- U.S. Bureau of Industry and Security, Department of Commerce Announces Rescission of Biden-Era Artificial Intelligence Diffusion Rule, Strengthens Chip-Related Export Controls, May 13, 2025.
- U.S. Government Accountability Office, B-337935: Applicability of the Congressional Review Act to the Rescission of the Artificial Intelligence Diffusion Rule, May 12, 2026.
- U.S. Bureau of Industry and Security, Guidance Regarding Enforcement of License Requirements for Advanced Computing Items for Entities Headquartered in Country Group D:5 and Macau, May 31, 2026.