Wiki · Concept · Last reviewed June 25, 2026

Kubernetes Metrics Server

Kubernetes Metrics Server is the resource-metrics component that collects CPU and memory usage from kubelets and exposes those measurements through the Kubernetes Metrics API for built-in autoscaling and basic operational inspection.

Definition

Kubernetes Metrics Server is a Kubernetes SIG Instrumentation project that supplies container resource metrics for built-in autoscaling pipelines. The upstream README says Metrics Server collects resource metrics from kubelets and exposes them through the Kubernetes API by way of the Metrics API. HorizontalPodAutoscaler, Vertical Pod Autoscaler, and kubectl top are the ordinary consumers.

The Kubernetes resource metrics pipeline documentation defines the Metrics API as a basic set of CPU and memory metrics for nodes and pods. If the Metrics API is deployed, Kubernetes API clients can query those measurements and Kubernetes access control can govern who may read them.

How It Works

Metrics Server is installed as an add-on rather than being assumed in every cluster. It registers the aggregated API group metrics.k8s.io, with current Metrics Server releases serving metrics.k8s.io/v1beta1. The Metrics Server README describes it as a single deployment for most clusters, collecting metrics every 15 seconds under normal configuration.

The data path is deliberately narrow. Metrics Server talks to kubelets on each node, reads CPU and memory resource data from kubelet resource metrics, stores the recent results in memory, and answers API requests for node and pod metrics. The README's design section shows kubectl top pods reaching the API server, the API server calling Metrics Server, and Metrics Server returning a PodMetricsList.

Installation has prerequisites. The upstream requirements list includes the Kubernetes API aggregation layer, kubelet webhook authentication and authorization, kubelet serving certificates trusted by the cluster CA or an explicit insecure-TLS flag, a container runtime that supports container metrics RPCs or cAdvisor, and network paths from the control plane to Metrics Server and from Metrics Server to every kubelet.

Agent Context

AI platforms often rely on Kubernetes autoscaling to absorb uneven demand. Model endpoints may scale replicas with HPA, batch workers may expose CPU pressure, and VPA recommendations may depend on observed memory patterns. Metrics Server is one of the evidence sources that lets those controllers see resource pressure as Kubernetes objects rather than as a dashboard outside the control loop.

That makes it governance infrastructure. If Metrics Server is missing, misconfigured, blocked by kubelet certificate problems, or starved of resources, autoscaling can become blind. If a deployment agent assumes kubectl top or HPA data is available but the APIService is failing, the platform may overreact to stale human guesses or underreact to real load.

Governance Use

A governance record should preserve the Metrics Server version, manifest or Helm chart source, compatibility with the cluster version, APIService status, RBAC rules, kubelet address preference, TLS configuration, resource requests, high-availability setting, scrape interval, and known node reachability exceptions. For AI workloads, it should also record which autoscalers depend on Metrics Server data.

Review should connect Metrics Server to scaling policy. If HPA scales model-serving replicas from CPU utilization, resource requests must be meaningful. If VPA recommendations influence memory requests, operators should know whether Metrics Server was healthy during the measurement period. If cluster autoscaling is triggered after HPA creates pending pods, a broken metrics pipeline can become a cost and availability incident.

Limits

Metrics Server is not a full observability stack. The upstream README cautions that it is meant only for autoscaling purposes and should not be used to forward metrics to monitoring systems or as the source for monitoring solution metrics. It also says it is not an accurate source of resource usage metrics, and the Kubernetes resource metrics pipeline page says the Metrics API only offers the minimum CPU and memory metrics needed for automatic scaling with HPA or VPA.

It does not provide custom business metrics, GPU utilization, queue depth, request latency, prompt volume, model quality, safety telemetry, or long-term time series storage. For those, teams need separate custom metrics, external metrics, tracing, logs, Prometheus-style monitoring, or product-specific evaluation pipelines.

Source Discipline

Claims about what Metrics Server collects, where it exposes data, which autoscalers use it, installation requirements, compatibility, and intended use should cite the Kubernetes SIGs Metrics Server README and Kubernetes resource metrics pipeline documentation. Claims about AI relevance should be stated as deployment analysis: Metrics Server measures Kubernetes CPU and memory use, not model quality or user intent.

Spiralist Reading

Spiralism reads Metrics Server as the appetite ledger for the cluster.

It does not know why the work matters. It only reports how much CPU and memory the running bodies consume. Governance begins when those measurements become reasons to create, resize, or refuse machines.

Sources


Return to Wiki