Wiki · Concept · Last reviewed June 25, 2026

KubeRay

KubeRay is an open source Kubernetes operator for running Ray clusters, jobs, and services as Kubernetes custom resources, making distributed AI and Python workloads easier to declare, scale, review, and retire.

Category: Concept Published: June 25, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: Kubernetes, Ray, KubeRay, distributed AI, AI compute

Definition

KubeRay is the Kubernetes operator and API layer for Ray workloads. The KubeRay repository describes it as an open source Kubernetes operator that simplifies deployment and management of Ray applications on Kubernetes. Ray's own overview defines Ray as an open source unified framework for scaling AI and Python applications, with a distributed compute layer for parallel processing.

The core KubeRay resources are RayCluster, RayJob, and RayService. The Ray on Kubernetes documentation says these custom resource definitions help users manage Ray clusters for different use cases. In practice, KubeRay is the bridge between Kubernetes platform operations and Ray's application-level model of distributed tasks, actors, data pipelines, training, serving, and batch work.

How It Works

RayCluster is the base object. The KubeRay repository says KubeRay manages the RayCluster lifecycle, including creation, deletion, autoscaling, and fault tolerance. Google Cloud's Ray on GKE documentation describes a Ray cluster as a head pod with worker pods, and treats RayCluster as the resource for specifying that cluster.

RayJob is for finite work. The repository says RayJob can create a RayCluster, submit a job once the cluster is ready, and optionally delete the cluster after the job finishes. The KubeRay API reference documents job fields for entrypoints, runtime environments, retry and deadline settings, cleanup behavior, cluster selection, and submission modes such as Kubernetes Job mode or HTTP mode.

RayService is for serving. The repository describes it as a combination of a RayCluster and a Ray Serve deployment graph, with support for zero-downtime upgrades and high availability. The Ray on Kubernetes docs also state that KubeRay supports heterogeneous compute nodes, including GPUs, and multiple Ray versions in the same Kubernetes cluster.

Agent Context

Many agent and AI operations are embarrassingly parallel until they are not: evaluation sweeps, retrieval index builds, synthetic-data generation, browser tests, simulation, batch inference, reinforcement-learning experiments, and model-serving backends all need distributed runtime support. Ray gives developers a Python-facing framework for that work; KubeRay makes the runtime visible to Kubernetes control planes.

That visibility matters. A pull request changing a RayJob is not just changing code. It may create a cluster, pull images, mount secrets, request GPUs, expose service endpoints, submit a job, and then delete the evidence if cleanup is too aggressive. KubeRay makes those actions declarative enough to review, but only if operators preserve the relevant manifests, logs, policies, and ownership records.

Governance Use

A governance record for KubeRay should preserve the KubeRay version, Ray version, operator installation method, CRD schema, namespace boundaries, service accounts, RBAC, image sources, runtime environment settings, resource requests and limits, GPU resource names, autoscaler settings, Ray dashboard and Jobs API exposure, network policy, secrets, storage mounts, queueing integration, cleanup rules, monitoring, and audit logs.

Managed offerings do not erase that responsibility. Google Cloud's Ray on GKE page, for example, says customers remain responsible for container images, Ray head and worker versions, resource configuration, cluster security practices, reliability, and monitoring. That shared-responsibility pattern should be recorded wherever a managed KubeRay operator is used.

Limits

KubeRay is not a safety evaluator, model-governance process, workload scheduler, or data-loss prevention system by itself. It can create a well-formed Ray cluster; it cannot decide whether the job is legitimate, whether the input data is licensed, whether the output should be released, or whether a tool-using agent should be trusted with the requested resources.

It also inherits Kubernetes and Ray exposure risks. Operators should treat Ray dashboards, job-submission endpoints, container images, runtime environments, object stores, logs, and service accounts as part of the trusted computing base. KubeRay makes distributed work easier to run; it does not make distributed work harmless.

Source Discipline

Claims about KubeRay's resources and lifecycle behavior should cite Ray's Kubernetes documentation, the KubeRay repository, or the generated API reference. Claims about a cloud provider's managed Ray operator should cite that provider. Claims about model-serving integrations should cite the project that owns the integration, such as vLLM for its KubeRay deployment guide.

Spiralist Reading

Spiralism reads KubeRay as a translation layer between swarm work and institutional memory.

Ray scatters computation across workers. Kubernetes records the desired shape. KubeRay sits at the hinge: a place where distributed thought becomes YAML, and where governance can still ask who requested the cluster, what it touched, and why it was allowed to disappear.

Sources

Ray, Overview, reviewed June 25, 2026.
Ray, Ray on Kubernetes, reviewed June 25, 2026.
Ray Project, KubeRay upstream repository, reviewed June 25, 2026.
KubeRay, API Reference, reviewed June 25, 2026.
Google Cloud, About Ray on Google Kubernetes Engine, reviewed June 25, 2026.
vLLM, KubeRay integration guide, reviewed June 25, 2026.

Return to Wiki