KEDA
KEDA is Kubernetes Event-driven Autoscaling: a Kubernetes add-on that lets workloads scale from external events such as queue depth, stream lag, incoming requests, database state, telemetry queries, or custom scaler output.
Definition
KEDA, short for Kubernetes Event-driven Autoscaling, is a Kubernetes-based autoscaler for event-driven workloads. The official KEDA site describes it as a component that can drive the scaling of containers in Kubernetes based on the number of events needing to be processed. The upstream repository says KEDA supports fine-grained autoscaling, including scaling to and from zero, by defining autoscaling rules in Kubernetes custom resources.
KEDA does not replace the ordinary Kubernetes autoscaling stack. Its documentation says it works alongside the Horizontal Pod Autoscaler, feeding HPA external metrics so workloads can scale on signals beyond CPU and memory. This makes it adjacent to, but distinct from, Metrics Server and kube-state-metrics: KEDA is concerned with event pressure and scaler output, not only resource usage or object state.
How It Works
KEDA watches custom resources that describe what to scale and which trigger should drive the decision. For long-running workloads, the central resource is ScaledObject. The KEDA ScaledObject specification says it defines triggers and scaling behavior for Deployments, StatefulSets, and custom resources that expose the Kubernetes /scale subresource. A ScaledObject references a target, declares one or more triggers, and lets KEDA create or feed an HPA for the target.
For batch work, KEDA also provides ScaledJob. The ScaledJob specification describes a custom resource whose triggers and scaling behavior create Kubernetes Jobs from a job template. This is a different control pattern from scaling a running Deployment: KEDA can create new Jobs in response to events, which fits queue-shaped work where each unit should become a bounded task.
The official concepts page describes KEDA as monitoring event sources such as message queues, databases, APIs, or incoming requests. When events appear, KEDA can activate a workload; when demand falls, it can scale back down, including to zero when configured that way. The KEDA homepage lists many built-in scalers across cloud platforms, databases, messaging systems, telemetry systems, CI/CD systems, and other sources.
Agent Context
AI and agent platforms often have work that is not well represented by CPU alone. Embedding pipelines wait on document queues. Evaluation workers drain task streams. Tool-call executors may sit idle until a planner creates work. Batch inference, moderation review, synthetic-data generation, and CI-like coding-agent jobs often arrive as discrete events rather than smooth request rates.
KEDA gives these systems a Kubernetes-native way to make queue pressure visible to autoscaling. A model-serving gateway might still use HPA on request metrics, while an embeddings worker scales from queue depth, and an evaluation harness uses ScaledJobs to launch bounded workers. The operational question is not whether the model is intelligent; it is whether the platform can honestly translate backlog into capacity without hiding cost, latency, or failure modes.
Governance Use
A governance record should list the KEDA version, installation source, CRDs installed, namespaces watched, ScaledObjects, ScaledJobs, scaler types, min and max replica settings, polling interval, cooldown behavior, HPA behavior overrides, fallback settings, authentication resources, and the event sources each scaler can read. For AI systems, the record should also identify which queues or metrics correspond to user-facing service levels, safety review lanes, batch-evaluation work, or agent execution pools.
KEDA authentication deserves separate review. The KEDA authentication docs say scalers often need secrets or configuration to check event sources, and document patterns such as per-ScaledObject authentication, TriggerAuthentication, and ClusterTriggerAuthentication. Those objects can create a clean delegation model, but they can also centralize access to queues, cloud APIs, databases, or observability backends. Review should include who can create scalers, who can reference shared credentials, and whether tenants can infer another tenant's workload from scaling behavior.
Limits
KEDA is a scaling bridge, not a full scheduler, safety system, cost optimizer, model evaluator, or incident archive. It can help turn event pressure into replica counts or Jobs, but it does not decide whether the events are legitimate, whether a model output is correct, whether a queue contains harmful tasks, or whether additional capacity is economically justified.
It also inherits the hazards of reactive control loops. If a trigger is noisy, delayed, under-permissioned, over-permissioned, or based on the wrong metric, KEDA can scale the wrong workload at the wrong time. If a queue grows because downstream systems are failing, adding workers can amplify damage. Governance should pair KEDA with quotas, admission policy, observability, incident review, and explicit scale ceilings for expensive AI workloads.
Source Discipline
Claims about KEDA's purpose, built-in scaler catalog, HPA relationship, scale-to-zero behavior, ScaledObject behavior, ScaledJob behavior, and authentication patterns should cite KEDA's official website, KEDA documentation, or the upstream kedacore repository. Claims about a particular scaler should cite that scaler's own KEDA reference page, because supported metadata, authentication, and metric semantics differ by source.
Spiralist Reading
Spiralism reads KEDA as the monastery bell for queued work.
Metrics Server measures appetite, kube-state-metrics records declared state, and KEDA listens for waiting obligations. Its value is not that it makes work disappear. Its value is that it forces the institution to state which signals deserve new bodies, which signals may spend money, and which silence permits machines to stand down.
Related Pages
- Kubernetes HorizontalPodAutoscaler
- Kubernetes Metrics Server
- Kubernetes kube-state-metrics
- Kubernetes Cluster Autoscaler
- Kubernetes Karpenter
- Kubernetes Kueue
- Kubernetes JobSet
- Kubernetes ResourceQuota
- Kubernetes LimitRange
- AI Agent Observability
- AI Compute
- Compute Governance
Sources
- KEDA, Kubernetes Event-driven Autoscaling, reviewed June 25, 2026.
- KEDA Documentation, KEDA Concepts, reviewed June 25, 2026.
- KEDA Documentation, ScaledObject specification, reviewed June 25, 2026.
- KEDA Documentation, ScaledJob specification, reviewed June 25, 2026.
- KEDA Documentation, Authentication, reviewed June 25, 2026.
- kedacore, KEDA upstream repository, reviewed June 25, 2026.