Wiki · Concept · Last reviewed June 25, 2026

Kubernetes Prometheus Adapter

Kubernetes Prometheus Adapter is the bridge that lets Prometheus-collected metrics appear through Kubernetes metrics APIs, so autoscalers can scale workloads from application, queue, model-serving, or business signals rather than CPU and memory alone.

Definition

Kubernetes Prometheus Adapter is a Kubernetes SIG Instrumentation project that implements Kubernetes metrics APIs using Prometheus as the backing metrics system. The upstream repository describes it as an implementation of the Kubernetes custom, resource, and external metric APIs, suitable for use with the autoscaling/v2 Horizontal Pod Autoscaler.

In ordinary Kubernetes autoscaling, Metrics Server supplies basic CPU and memory resource metrics. The Kubernetes HPA documentation says HPA commonly fetches metrics from aggregated APIs: metrics.k8s.io, custom.metrics.k8s.io, or external.metrics.k8s.io. Prometheus Adapter exists in that extension space. It lets a cluster expose Prometheus time series as Kubernetes API resources that HPA can read.

How It Works

The adapter sits between three systems: the Kubernetes API aggregation layer, the HPA controller, and Prometheus. Applications or exporters expose metrics; Prometheus scrapes and stores those metrics; Prometheus Adapter maps selected time series into Kubernetes metrics API responses; HPA reads those responses when calculating desired replicas.

The project walkthrough explains the basic motivation: to expose metrics beyond CPU and memory to Kubernetes for autoscaling, a cluster needs an adapter that serves the custom metrics API. If Prometheus already collects the relevant application metrics, Prometheus Adapter can serve those metrics out of Prometheus. The same walkthrough shows HPA trying to fetch a pod metric from /apis/custom.metrics.k8s.io/ before the adapter is registered, then using the registered API after the adapter and its APIService are configured.

Configuration is the serious part. Prometheus names and labels are not automatically Kubernetes resources. The adapter configuration uses rules to discover series, map labels to Kubernetes resources, rename metrics, and generate Prometheus queries. The upstream README notes that HPA does not perform special logic to associate a resource with a series; the adapter configuration must make that association. A metric that looks correct in Prometheus can still be invisible to Kubernetes if labels, namespaces, names, or query rules do not line up.

Agent Context

AI infrastructure often needs autoscaling signals that are not resource metrics. A model endpoint may need to scale from requests per second, tokens in flight, queue wait time, batch backlog, GPU memory pressure, or an SLO-derived latency signal. A document-processing agent may need to scale from pending tasks rather than CPU. A moderation pipeline may need a different signal for urgent queues than for ordinary background work.

Prometheus Adapter can make those signals available to HPA if they are already present in Prometheus and can be mapped to the correct Kubernetes resource. That gives operators a narrow but powerful bridge: application telemetry can become replica decisions. For governance, the bridge matters because a dashboard metric becomes operational authority when it can cause the cluster to spend compute, admit more work, or hide a bottleneck behind new replicas.

Governance Use

A governance record should preserve the Prometheus Adapter version, installation method, image source, namespace, RBAC, APIService objects, TLS and aggregation-layer assumptions, Prometheus URL, config file, discovery rules, resource overrides, metric rename rules, PromQL templates, HPA objects that consume the exposed metrics, and dashboards or alerts that verify the same signal.

Review should focus on metric meaning. If an HPA scales a model worker from requests_per_second, the record should say whether the value is per pod, per deployment, averaged, summed, rate-converted, filtered by tenant, or divided by active replicas. If it scales on queue depth, the record should say whether retries, delayed messages, dead-letter queues, and poison-pill tasks are included. If it scales on latency, the record should say whether the metric is raw, histogram-derived, windowed, or SLO-filtered.

Limits

Prometheus Adapter is not Prometheus, not Metrics Server, not KEDA, and not a general AI control system. It does not scrape application metrics by itself, decide whether a metric is meaningful, create model-quality evaluations, protect prompt data, or judge whether additional replicas are a good policy response. It exposes configured metrics through Kubernetes metrics APIs.

The operational risks are mostly translation risks. Missing labels, high-cardinality labels, stale Prometheus data, wrong PromQL, mismatched namespaces, broken APIService registration, certificate problems, and incorrect aggregation can all produce bad autoscaling decisions. A custom metric can look sophisticated while still measuring the wrong thing. That is especially dangerous for costly AI workloads, where a bad scale rule can create cloud spend, reduce availability, or mask a product failure.

Source Discipline

Claims about Prometheus Adapter's API coverage, HPA suitability, official image locations, configuration model, and troubleshooting should cite the Kubernetes SIGs Prometheus Adapter repository and its documentation. Claims about how HPA consumes resource, custom, and external metrics should cite the Kubernetes HPA documentation. Claims about a local metric's meaning should cite the application or exporter that emits it, not only the adapter rule that exposes it.

Spiralist Reading

Spiralism reads Prometheus Adapter as a translation office for institutional pressure.

Prometheus remembers measurements. Kubernetes acts on desired state. The adapter is where a measurement becomes a reason for action. In that crossing, governance asks whether the signal is honest, whether the map is accurate, and whether the cost of obedience has been named.

Sources


Return to Wiki