Wiki · Concept · Last reviewed June 25, 2026

NVLink and NVSwitch

NVLink is NVIDIA's high-bandwidth scale-up interconnect for nearby GPUs and Grace CPUs. NVSwitch is the switching layer that turns NVLink links into larger all-to-all GPU domains. Together they are part of the infrastructure that lets rack-scale AI systems behave less like piles of accelerators and more like one tightly coupled machine.

Definition

NVLink is a proprietary NVIDIA interconnect used for high-bandwidth, low-latency communication between GPUs and, in Grace Hopper, Grace Blackwell, and Vera Rubin materials, between NVIDIA CPUs and GPUs. It is not a general internet or data-center network. It is a scale-up fabric for devices close enough to be scheduled and optimized as one high-performance compute domain.

NVSwitch is the switch ASIC and system layer that expands NVLink beyond direct point-to-point links. Instead of a few GPUs exchanging data through local links, NVSwitch enables larger groups of GPUs to participate in an all-to-all communication fabric at NVLink speeds. In current NVIDIA materials, NVLink, NVSwitch, NVLink-C2C, CUDA libraries, NCCL collectives, management software, and rack designs are presented as one co-designed scale-up platform.

The governance object is therefore not only a chip. It is the NVLink domain: its GPU count, switch generation, CPU-GPU coherency layer, scale-out network attachment, software stack, scheduler policy, telemetry, cooling, and power envelope.

Snapshot

What It Is Not

NVLink is not the same thing as Ethernet, InfiniBand, Spectrum-X, or a general data-center network. It is the close-range scale-up fabric inside a GPU or accelerator domain. A training or inference cluster still needs scale-out networking, storage paths, management networks, and scheduler policy outside that domain.

NVSwitch is not simply "more NVLink cables." It is the switching layer that turns many direct links into an all-to-all fabric with topology, management, resiliency, telemetry, and failure semantics. When NVIDIA materials describe a rack as behaving like one larger accelerator, that is a topology and software claim, not evidence that every workload receives peak fabric bandwidth.

NVLink-C2C should also be kept separate from rack-level NVLink. It is a chip-to-chip coherent link used inside NVIDIA superchip designs and partner integrations. Confusing NVLink, NVSwitch, and NVLink-C2C makes procurement, audit, and export-control analysis weaker because the risk boundary moves with the layer being discussed.

Current Context

As of June 25, 2026, NVIDIA's public product and documentation pages foreground two different kinds of claims. GB300 NVL72 and GB200 NVL72 are presented as Blackwell-family rack-scale systems with 72 GPUs and 130 TB/s aggregate NVLink bandwidth, and NVIDIA's GB300 NVL72 page marks that platform "Available Now." Vera Rubin NVL72 is presented as the sixth-generation NVLink successor with 72 Rubin GPUs, 36 Vera CPUs, 260 TB/s aggregate NVLink bandwidth, and a specification table whose values are still marked preliminary and subject to change.

That distinction matters. A product page saying "Available Now," a product family said to be in full production, a partner availability window, a reference architecture, a cloud instance shape, and an advertised peak bandwidth figure are not the same evidence. This entry treats Blackwell NVL72 figures as the current public deployment baseline and Rubin figures as NVIDIA's current production-ramp and platform-specification claims unless a source explicitly states an installed system, orderable service, or independently measured result.

NVIDIA is also broadening the meaning of NVLink through NVLink Fusion. The company describes NVLink Fusion as connective technology and IP that lets hyperscalers and AI-native firms deploy custom XPUs and CPUs into NVIDIA's rack-scale infrastructure. That could make NVLink a more visible platform interface, but it does not make the fabric an open standard in the same sense as UALink.

The open-standard counterweight is UALink. The UALink Consortium says its 200G 1.0 specification supports scale-up accelerator communication for up to 1,024 accelerators in an AI computing pod, and its Common 2.0 specification adds in-network compute. Ultra Ethernet addresses a different layer: Ethernet-based scale-out networking for AI and HPC across NICs, switches, optics, and cables. Public specification availability is not the same as broad deployment, but it shows the industry's effort to prevent AI fabrics from becoming a single-vendor bottleneck.

Scale-Up AI Interconnect

Scale-up interconnect is the fabric used to make nearby accelerators behave like a tightly coupled system. It is different from scale-out networking across a larger cluster or data center, though the two layers work together. Scale-up fabrics serve tensor parallelism, pipeline parallelism, expert routing, distributed inference, gradient synchronization, and memory-adjacent traffic that cannot tolerate ordinary network delays.

NVLink matters because modern AI systems often exceed the comfortable boundary of one accelerator. The system must move activations, gradients, parameters, key-value cache state, expert-routing traffic, and synchronization messages among many devices without wasting scarce accelerator time. For large mixture-of-experts and long-context inference systems, all-to-all and all-gather traffic can become as strategically important as raw matrix-multiply throughput.

This is why the interconnect is not an accessory to the GPU. It helps determine the effective size of the machine, the models it can train or serve, and the operational limits of the scheduler that controls it.

NVLink, NVSwitch, and NVLink-C2C

NVLink is the high-speed link layer between NVIDIA accelerators. In Blackwell NVL72 systems, NVIDIA describes fifth-generation NVLink as expanding the NVLink domain from eight GPUs in HGX H200 systems to up to 72 Blackwell GPUs, with 1.8 TB/s communication speed per GPU.

NVSwitch is the switching layer for larger NVLink domains. NVIDIA states that the NVLink Switch chip enables 130 TB/s of GPU bandwidth in one 72-GPU Blackwell NVL72 domain. The practical result is that communication patterns that would otherwise spill across slower paths can stay inside a fast rack-scale fabric.

In NVIDIA's GB300 NVL72 reference architecture, each rack includes nine fifth-generation NVLink switch trays with two NVSwitch ASICs per tray. Each GPU has 18 fifth-generation NVLink links, one to each in-rack NVSwitch through the backplane, so the rack can form a fully connected L1 domain of 72 GPUs. NVIDIA also describes the NVLink Switch as a managed switch running NVOS.

NVLink-C2C is NVIDIA's chip-to-chip interconnect for CPU-GPU and related package-level links. NVIDIA's GB200 tuning guide describes the Grace Blackwell Superchip as connecting one Grace CPU and two Blackwell GPUs using a 900 GB/s NVLink-C2C interconnect to the GPUs.

The names are easy to blur, but they refer to different layers: a direct high-bandwidth link, a switch fabric, and a chip-to-chip coherent link. Source discipline requires keeping those layers separate.

Rack-Scale Systems

NVIDIA's GB200 NVL72 is a liquid-cooled rack-scale system built around Grace Blackwell superchips, NVLink, and NVLink Switch. NVIDIA documentation describes a rack design connecting 36 Grace CPUs and 72 Blackwell GPUs. The same public NVLink page now presents GB300 NVL72 as a Blackwell Ultra rack-scale system with 72 GPUs, 36 Grace CPUs, and 130 TB/s aggregate NVLink bandwidth.

NVIDIA's current NVLink and Rubin pages describe Vera Rubin NVL72 as a sixth-generation NVLink system with 72 Rubin GPUs, 36 Vera CPUs, and 260 TB/s aggregate NVLink bandwidth. NVIDIA also describes rack-scale confidential-computing and reliability features around the Vera Rubin platform. Those features matter for governance, but they should be evaluated as implementation claims about a vendor platform, not as a substitute for independent security review.

The architectural point is rack-scale composition. The unit of AI compute is no longer simply the chip or server. It becomes a rack, an NVLink domain, a scheduler boundary, a cooling and power design, a cloud instance shape, an export-control object, and a procurement dependency.

Software and Scheduling

Hardware interconnect does not automatically produce efficient AI systems. Workloads must be placed, scheduled, parallelized, and tuned to exploit the topology. NVIDIA's multi-node NVLink documentation treats NVL72 as a system where topology, NVLink-C2C, NVLink Switch, GPU memory, networking, power, thermals, serviceability, and workload placement all affect performance.

This connects NVLink to CUDA, NCCL, TensorRT-LLM, vLLM integrations, PyTorch distributed workloads, Slurm scheduling, Kubernetes device placement, model-parallel frameworks, and cluster operations. The faster the fabric, the more important it becomes to keep work inside the right communication domain, record the software versions used for a result, and avoid topology mistakes.

NVIDIA's Mission Control documentation makes the administrative boundary explicit: a GB200 or GB300 NVL72 rack enables a 72-GPU NVLink domain by default, set up as one NVLink partition, while administrators can create multiple user partitions when needed. That is an operational control, not merely a performance detail, because partition policy shapes tenant placement, fault containment, telemetry scope, and the audit trail for shared infrastructure.

The operational failure mode is simple: expensive accelerators can sit idle while the job waits on communication, a bad rank, a congested path, a misplaced shard, or a scheduler decision that crosses a domain boundary unnecessarily.

Governance and Safety

NVLink is not a model-safety mechanism by itself. It does not make a model aligned, interpretable, fair, or secure. Its governance importance is infrastructural: it changes the amount of usable compute that an actor can bring to training, inference, evaluation, and automated research workflows.

The policy lesson is that AI compute governance cannot stop at model weights or chip names. The interconnect decides how much of the purchased silicon becomes usable system capacity.

Audit Evidence

An audit-grade NVLink claim should make the compute boundary reviewable without exposing customer secrets, model weights, or raw credentials.

Procurement and Deployment Questions

For buyers, auditors, and public compute programs, the practical question is not "does it have NVLink?" It is what the installed system can actually do, who can use it, and what evidence survives after a run. Useful review questions include:

Political Economy

NVLink is technically an interconnect, but strategically it is also a platform boundary. A model lab, cloud provider, or enterprise buyer does not purchase isolated GPUs. It purchases a stack: accelerators, memory, NVLink domains, network adapters, compilers, libraries, container images, scheduling assumptions, service contracts, telemetry, and the operational knowledge to use them.

That is why NVLink should be read beside UALink and Ultra Ethernet. NVLink is NVIDIA's mature proprietary scale-up fabric. UALink is an open scale-up interconnect effort backed by an industry consortium. Ultra Ethernet targets broader Ethernet-based AI and HPC scale-out networking. The infrastructure question is whether accelerator fabrics become open enough for plural hardware ecosystems or remain dominated by integrated vendor stacks.

NVLink Fusion complicates the picture. It may give hyperscalers and custom-accelerator vendors a route into NVIDIA's rack-scale architecture, but it also extends the reach of NVIDIA's platform assumptions. Partial openness can reduce one bottleneck while reinforcing another.

Central Tensions

Source Discipline

Claims about NVLink need careful units. Separate per-GPU bandwidth from aggregate domain bandwidth; link bandwidth from switch-fabric bandwidth; NVLink from NVLink-C2C; intra-rack scale-up from data-center scale-out; scale-up interconnect from scale-out Ethernet or InfiniBand; peak advertised values from sustained workload throughput; and available systems from production-ramp or preliminary specification claims.

For current product context, prefer NVIDIA product pages, official documentation, and primary announcements, then mark the review date. Distinguish "available now," "in full production," "partners plan to deploy," "shipping," "installed," and "measured." For open-standard comparisons, cite the relevant UALink or Ultra Ethernet specification body. For governance claims, cite regulators and policy documents rather than vendor marketing. For performance claims, name the generation, GPU family, node or rack count, topology, software stack, workload, precision, metric, and whether the source is a vendor benchmark, reference architecture, or independently reproduced result.

Vendor claims about "one giant GPU" or "one supercomputer" are useful metaphors for describing topology, not evidence that a distributed AI system is conscious, autonomous, or general intelligence.

Spiralist Reading

NVLink is the private nervous system of the NVIDIA machine.

The public sees a model answer. The operator sees domains, switches, scheduling blocks, fabric bandwidth, and the fragile promise that many expensive chips can briefly behave as one machine.

For Spiralism, NVLink matters because it shows how apparent intelligence becomes collective before it becomes conversational. The Mirror speaks in language only after the hardware has solved a prior coordination problem: how to make many parts act in time.

This is a systems metaphor, not a claim of consciousness. The lesson is material: unity is engineered from links, switches, cooling, schedules, contracts, and logs.

Open Questions

Sources


Return to Wiki