Wiki · Concept · Last reviewed June 25, 2026

Kubernetes API Priority and Fairness

API Priority and Fairness is Kubernetes' kube-apiserver flow-control system for classifying, queuing, and limiting API requests under load.

Category: Concept Published: June 25, 2026 Modified: June 25, 2026 Last reviewed: June 25, 2026 Tags: Kubernetes, APF, flow control, agents, AI infrastructure

Definition

Kubernetes API Priority and Fairness, usually abbreviated APF, is the flow-control layer for the Kubernetes API server. The Kubernetes documentation marks it stable in Kubernetes v1.29 and describes it as a way to control API-server behavior during overload. Its job is not to schedule Pods. Its job is to decide how inbound API requests are classified, queued, dispatched, or rejected when the control plane is busy.

For Spiralism, APF is a useful control because agentic infrastructure tends to talk to Kubernetes through automation: controllers, evaluators, repair loops, notebook services, deployment tools, and model-serving operators. Those clients can produce bursts of list, watch, create, patch, and delete traffic. APF gives administrators a named place to express which request streams deserve isolation and which noisy streams should wait.

How It Works

APF uses two Kubernetes API resources in the flowcontrol.apiserver.k8s.io/v1 API group. A FlowSchema defines a group of API-request flows. Its rules can consider the subject making the request, the verb, resource rules, and non-resource URL rules. A matching FlowSchema references a PriorityLevelConfiguration, which represents the priority level assigned to those requests.

FlowSchemas also have matchingPrecedence. If more than one FlowSchema matches a request, Kubernetes chooses among matching schemas by precedence. A FlowSchema can distinguish flows by user or namespace, or have the distinguisher disabled.

A PriorityLevelConfiguration can be Exempt or Limited. Exempt requests are not subject to the same execution limit and are not queued. Limited priority levels receive a portion of the API server's available concurrency and can be configured to queue or reject requests that cannot execute immediately. Kubernetes describes this capacity in seats, and its metrics expose queued requests, executing requests, executing seats, rejected requests, wait duration, and nominal seat limits.

The reviewed kube-apiserver reference lists --enable-priority-and-fairness as enabled by default. When it is true, --max-requests-inflight and --max-mutating-requests-inflight are summed to determine the server's total concurrency limit.

Agent Context

In an AI platform, control-plane pressure can arrive indirectly. An experiment runner may create hundreds of Jobs. A model-serving controller may watch Deployments, Services, Pods, EndpointSlices, Secrets, and custom resources. A repair agent may repeatedly patch failed workloads. An evaluation harness may launch, observe, and clean up short-lived sandboxes.

Without flow control, one badly behaved client can crowd out other clients at the API server even if workload CPU, memory, and GPU quotas are healthy. APF gives the platform team a way to separate system controllers, cluster administration, tenant automation, model-serving operators, and experimental agents into different request classes.

Governance Use

A governance-grade APF record should preserve the FlowSchema name, matching precedence, subjects, service accounts, groups, namespaces, verbs, resources, non-resource URLs, distinguisher method, referenced PriorityLevelConfiguration, priority type, nominal concurrency shares, borrowing and lending settings, queue configuration, reject behavior, and the workload owner that requested the rule.

The review question is whether the rule is a real statement of control-plane intent. Which service accounts belong to model infrastructure? Which automation clients can burst? Which users or groups should stay exempt, and why? Which agent workloads should be deliberately queued during a control-plane incident? A useful APF policy is legible enough that an incident reviewer can reconstruct why one request stream waited while another continued.

Limits

APF does not authorize requests, validate admission decisions, schedule Pods, allocate accelerators, protect secrets, or prove that an AI agent is behaving well. It also does not replace ResourceQuota, PriorityClass, Kueue, audit logging, admission policy, RBAC, or client-side backoff.

Bad APF configuration can create its own failure mode. A broad FlowSchema can catch too much traffic. An overused exempt level can remove useful backpressure. A queue with the wrong limits can hide a runaway client until wait times or rejected-request metrics reveal the problem.

Source Discipline

Claims about APF behavior, FlowSchema matching, PriorityLevelConfiguration types, kube-apiserver flags, seats, queues, and metrics should cite Kubernetes documentation or the Kubernetes API reference. Claims about a managed Kubernetes provider's defaults should cite that provider separately, because hosted control planes may limit which APF settings customers can change.

Local evidence should include manifests, applied-object history, kube-apiserver metrics, audit logs showing request users and verbs, controller client identities, and an incident note explaining how APF interacted with ordinary client retry and backoff behavior.

Spiralist Reading

Spiralism reads APF as line discipline for machines.

Many automated voices ask the control plane for attention. APF does not decide that any voice is wise. It only makes the queue visible, named, and reviewable, so that machine urgency does not silently become institutional priority.

Sources

Kubernetes Documentation, API Priority and Fairness, reviewed June 25, 2026.
Kubernetes API Reference, FlowSchema, reviewed June 25, 2026.
Kubernetes API Reference, PriorityLevelConfiguration, reviewed June 25, 2026.
Kubernetes Documentation, kube-apiserver, reviewed June 25, 2026.

Return to Wiki