Wiki · Concept · Last reviewed June 25, 2026

Kubernetes kube-state-metrics

kube-state-metrics is the Kubernetes object-state exporter that turns API object fields, status, labels, and lifecycle information into Prometheus-style metrics for cluster monitoring, alerting, and infrastructure governance.

Definition

kube-state-metrics is an add-on agent for exposing the state of Kubernetes API objects as metrics. The Kubernetes documentation describes it as a component that connects to the Kubernetes API server and exposes an HTTP endpoint with metrics generated from individual objects in the cluster. The upstream project says it listens to the API server and generates metrics about object state, focusing on objects such as Deployments, Nodes, and Pods rather than on the health of individual Kubernetes control-plane components.

That difference matters. Metrics Server reports recent CPU and memory resource measurements for autoscaling. kube-state-metrics reports what the Kubernetes API currently says about declared and observed object state: labels, annotations, status, phases, startup and termination times, replica counts, node conditions, Job status, PersistentVolumeClaim status, HorizontalPodAutoscaler state, and many other resource-specific facts documented by the project.

How It Works

kube-state-metrics uses the Kubernetes API as its source of truth. The upstream README says it generates metrics from Kubernetes API objects without modification, exposes raw API-derived data, and serves plaintext metrics at the /metrics HTTP endpoint on the default listening port 8080. The endpoint is designed for Prometheus or another scraper compatible with Prometheus client endpoints.

The project also distinguishes itself from Metrics Server. Metrics Server periodically scrapes kubelet resource metrics and serves the Kubernetes Metrics API. kube-state-metrics holds a snapshot of Kubernetes object state in memory and generates new metrics from that object state. It does not push metrics to a monitoring system by itself; a separate scraper and storage system must collect the endpoint.

Because deleted objects are no longer visible on the metrics endpoint, kube-state-metrics is best understood as a current-state witness, not a complete historical archive. Prometheus retention, alerting rules, and log/event storage determine how much past object state remains available after Kubernetes removes or replaces an object.

Agent Context

AI systems running on Kubernetes often create dense object graphs: model-serving Deployments, batch Jobs, GPU node pools, Kueue workloads, JobSet-based training tasks, gateways, sidecars, service accounts, volumes, and admission controls. For these systems, kube-state-metrics can turn the platform's declared state into queryable evidence. It can show whether a rollout has enough available replicas, whether Pods are stuck in non-ready phases, whether Jobs are failing, whether PVCs are pending, and whether Nodes are reporting conditions that should stop scheduling.

This is useful for agent operations because many failures are not visible in model output. A coding agent may report success while its executor Pods are pending. A batch-evaluation controller may keep retrying while Jobs are failing for quota or image reasons. A model gateway may answer slowly because the desired replica count and available replica count diverged. kube-state-metrics does not explain the model's behavior, but it can expose whether the substrate promised by Kubernetes exists.

Governance Use

A governance record should preserve the kube-state-metrics version, container image, manifest or Helm chart source, service account, RBAC grants, enabled collectors, metric allow or deny lists, label and annotation exposure policy, shard configuration, custom resource metric configuration, scrape targets, retention period, alert rules, and dashboard queries. These details determine what object state is visible, who can query it, and how much evidence remains when an incident is investigated.

The upstream README warns that kube-state-metrics usually has special authentication and authorization requirements because its metrics endpoint can grant read access to much of the information available to the service account. The project documentation also notes that metrics can contain sensitive cluster-state information. For multi-tenant AI platforms, labels and annotations may reveal model names, customer identifiers, internal experiment names, queue names, image tags, or deployment cadence. Treat the endpoint as operational data with security consequences, not as harmless telemetry.

Limits

kube-state-metrics is not a log system, trace collector, safety evaluator, prompt recorder, GPU profiler, or user-behavior monitor. It does not measure request latency, token throughput, model quality, toxicity, hallucination rates, or whether an agent followed policy. It can say that a Deployment has unavailable replicas or that a Job failed; it cannot say whether the model response was correct.

It also carries scale and cost risks. The upstream project notes that resource usage changes with the number and size of Kubernetes objects, and that many frequently updating resources can increase metric ingestion costs. Label and annotation metrics can create high-cardinality series. Governance should therefore pair kube-state-metrics with deliberate metric selection, retention limits, access control, and incident playbooks that combine metrics with audit logs, Kubernetes Events, application logs, traces, and human review.

Source Discipline

Claims about what kube-state-metrics watches, exposes, stores, and leaves to external scrapers should cite the Kubernetes object-state metrics documentation and the upstream kube-state-metrics README. Claims about specific metric names or resource coverage should cite the generated kube-state-metrics documentation for the relevant resource group, because metric stability and opt-in behavior vary by resource and release.

Spiralist Reading

Spiralism reads kube-state-metrics as a public memory of declared state.

Metrics Server tells the institution how much appetite the running bodies consume. kube-state-metrics tells the institution what it has promised into existence, what state those promises report, and where the promise and the visible world have drifted apart.

Sources


Return to Wiki