A HorizontalPodAutoscaler (HPA for short) automatically updates a workload resource (such as Deployment or StatefulSet), with the goal of automatically scaling the workload to match demand.
Scaling out means that the answer to the increased load is to deploy more Pods. This is different from scaling up, which for Kubernetes would mean allocating more resources (e.g. memory or CPU) to the Pods already running for the workload.
If the load decreases and the number of Pods is above the configured minimum, HorizontalPodAutoscaler instructs the workload resource (Deployment, StatefulSet, or other similar resource) to shrink again.
This document walks you through an example of how to enable HorizontalPodAutoscaler to automatically manage scaling for a sample Web application. This example workload is Apache httpd running PHP code.
Before you begin
You must have a Kubernetes cluster and the kubectl command-line tool must be configured to communicate with your cluster. We recommend that you run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you don’t already have a cluster, you can create one using minikube or you can use one of these
Play with Kubernetes Your
must be later than version 1.23. To check the version, enter kubectl version. If you are running an older version of Kubernetes, refer to the documentation version of that version (see available documentation versions).
To follow this tutorial, you must also use a cluster that has a metrics server deployed and configured. Kubernetes Metrics Server collects resource metrics from the cluster’s kubelets and exposes those metrics through the Kubernetes API, using an APIService to add new resource types that represent metric reads.
For information about how to deploy the metrics server, see the documentation for the metrics server.
Run and expose the php-apache server To demonstrate a HorizontalPodAutoscaler, you
will first start a deployment that runs a container with the hpa-example image and expose it as a service using
the following manifest:To
do this, run the following command:
deployment.apps/php-apache created service/
Create the HorizontalPodAutoscaler
Now that the server is running, create the autoscaler with kubectl. There is a kubectl autoscaling subcommand, part of kubectl, that helps you do this.
You will soon execute a command that creates a HorizontalPodAutoscaler that maintains between 1 and 10 replicas of the Pods controlled by the php-apache Deployment you created in the first step of these instructions.
Generally speaking, the HPA driver will increase and decrease the number of replicas (by updating the deployment) to maintain an average CPU utilization across all Pods of 50%. Next, the deployment updates the ReplicaSet (this is part of how all deployments work in Kubernetes), and then the ReplicaSet adds or removes pods based on the change in its .spec.
Since each pod requests 200 milli-cores per kubectl run, this means an average CPU usage of 100 milli-cores. See Algorithm details for more details about the algorithm.
Create the HorizontalPodAutoscaler:horizontalpodautoscaler.autoscaling/php-apache
autoscaled You can check the current status of the newly created HorizontalPodAutoscaler, running
The result is similar to:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s
(if you see other HorizontalPodAutoscalers with different names, that means they already existed, and it is not usually a problem).
Note that the current CPU consumption is 0%, as there are no clients sending requests to the server (the TARGET column shows the average of all Pods controlled by the corresponding deployment).
Next, see how the autoscaler reacts to the increased load. To do this, you will start a different Pod to act as a client. The container inside the client Pod runs in an infinite loop, sending queries to the php-apache service.
Within a minute or so, you should see the highest CPU load; for example:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m
and then, more replicas. For example:
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m
Here, CPU consumption has increased to 305% of the request. As a result, the deployment was resized to 7 replicas:
You should see that the replica count matches the HorizontalPodAutoscaler figure
NAME READY TO DATE AGE AVAILABLE PHP-APACHE 7/7 7 7 19m
To finish the example, stop sending the payload.
In the terminal where you created the Pod running a busybox image, finish load generation by typing <Ctrl> + C.
Then check the status of the result (after a minute or so)
The output is similar
to:TARGET REFERENCE NAME MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m and the
implementation also shows that it has been reduced:
NAME READY TO DATE AVAILABLE AGE php-apache 1/1 1 1 27m Once CPU utilization
was reduced to 0, the HPA automatically reduced the number of replicates to 1.
Autoscaling of replicas can take a few minutes.
Autoscaling across multiple metrics and custom
You can introduce additional metrics to use when automatically scaling your php-apache implementation using the autoscale/v2 API version.
First, get the YAML
of your HorizontalPodAutoscaler in the autoscaling
/v2 format:Open the /
tmp/hpa-v2.yaml file in an editor, and you should see YAML which looks like this:
Notice that the targetCPUUtilizationPercentage field has been replaced by an array called metrics. The CPU utilization metric is a resource metric because it is represented as a percentage of a specified resource in pod containers. Note that you can specify other resource metrics besides CPU. By default, the only other resource metric supported is memory. These resources are not renamed from one cluster to another and should always be available, as long as the metrics.k8s.io API is available.
You can also specify resource metrics in terms of direct values, rather than as percentages of the requested value, by using a target.type of AverageValue instead of Utilization, and by setting the corresponding target.averageValue field instead of the target.averageUtilization field.
There are two other types of metrics, which are considered custom metrics: pod metrics and object metrics. These metrics can have cluster-specific names and require more advanced cluster monitoring configuration.
The first of these alternative metric types is pod metrics. These metrics describe the pods and are averaged together across all pods and compared to a target value to determine the replica count. They work much like resource metrics, except they only support one AverageValue target type.
Pod metrics are specified using a metric block like this:
The second type of alternate metric is object metrics. These metrics describe a different object in the same namespace, rather than describing Pods. Metrics are not necessarily obtained from the object; they only describe it. Object metrics support target types of Value and AverageValue. With Value, the target is directly compared to the metric returned from the API. With AverageValue, the value returned by the custom metrics API is divided by the number of pods before being compared to the target. The following example is the YAML representation of the requests per second metric.
If you provide multiple metric blocks of this type, HorizontalPodAutoscaler will consider each metric separately. HorizontalPodAutoscaler will calculate the proposed replica counts for each metric, and then choose the one with the highest replica count.
For example, if you had your monitoring system collecting metrics about network traffic, you could update the definition above using kubectl edit to look like this:
HorizontalPodAutoscaler would try to make sure that each pod consumed about 50% of its requested CPU, serving 1000 packets per second, and that all pods behind the main path Ingress were serving a total of 10000 requests per second.
Auto-scaling on more specific
metrics Many metric pipelines allow you to describe metrics by name or by a set of additional descriptors called labels. For all non-resource metric types (pod, object, and external, described below), you can specify an additional tag selector that is passed to the metrics pipeline. For example, if you collect a metric http_requests with the verbal tag, you can specify the following metric block to scale only on GET requests:
This selector uses the same syntax as Kubernetes full tag selectors. The monitoring pipeline determines how to collapse multiple series into a single value, if the name and selector match multiple series. The selector is additive and you cannot select metrics that describe objects other than the target object (the target pods for the Pods type and the described object for the Object type).
on non-Kubernetes object-related metrics Applications running on Kubernetes may
need to automatically scale based on metrics that have no obvious relationship to any objects in the Kubernetes cluster, such as metrics that describe a hosted service with no direct correlation to Kubernetes namespaces. In Kubernetes 1.10 and later, you can address this use case with external metrics.
Using external metrics requires knowledge of your monitoring system; the setup is similar to that required when using custom metrics. External metrics allow you to automatically scale your cluster based on any metrics available in your monitoring system. Provide a metric block with a name and selector, as noted above, and use the external metric type instead of Object. If the corrector for multiple time series matches, HorizontalPodAutoscaler uses the sum of their values. External metrics support the Value and AverageValue target types, which work exactly the same as when using the Object type.
For example, if your app processes tasks from a hosted queue service, you can add the following section to the HorizontalPodAutoscaler manifest to specify that you need one worker for every 30 pending tasks.
When possible, it is
preferable to use custom metric target types rather than external metrics, as it is easier for cluster administrators to secure the custom metrics API. The External Metrics API potentially allows access to any metric, so cluster administrators should be careful when exposing it.
Pod Autoscaler Health Conditions
When using the HorizontalPodAutoscaler/v2 autoscaling form, you can view the health conditions set by Kubernetes in HorizontalPodAutoscaler. These status conditions indicate whether HorizontalPodAutoscaler can scale and whether it is currently restricted in any way.
The conditions appear in the status.conditions field. To see the conditions that affect a HorizontalPodAutoscaler, we can use kubectl describe hpa
:Name: cm-test Namespace: prom Tags: <none> Annotations: <none> CreationTimestamp: Fri, 16 Jun 2017 18:09:22 +0000 Reference: ReplicationController/cm-test Metrics: ( current / target ) “http_requests” in pods: 66m / 500m Minimum replicas: 1 Maximum replicas: 4 pods ReplicationController: 1 current / 1 desired Conditions: Type Status Reason Message – – – – AbleToScale True ReadyForNewScale the last scale time was old enough to warrant a new scale ScaleActive True ValidMetricFound the HPA was able to correctly calculate a replica count from the pod metric http_requests ScalingLimited False DesiredWithinRange The desired replica count is within the acceptable range Events:
For this HorizontalPodAutoscaler, you can see several conditions in a healthy state. The first, AbleToScale, indicates whether or not HPA can obtain and update scales, as well as whether or not any conditions related to backtracking would prevent scaling. The second, ScalingActive, indicates whether or not HPA is enabled (i.e. the target’s replica count is not zero) and can calculate the desired scales. When False, it usually indicates problems with obtaining metrics. Finally, the last condition, ScalingLimited, indicates that the desired scale was limited by the maximum or minimum of HorizontalPodAutoscaler. This is an indication that you may want to increase or decrease the minimum or maximum replica count restrictions on your HorizontalPodAutoscaler.
HorizontalPodAutoscaler API metrics and metrics are specified using a special integer notation known in Kubernetes as quantity. For example, the quantity 10500m would be written as 10.5 in decimal notation. Metrics APIs will return whole numbers without a suffix when possible, and will usually return quantities in milliunits otherwise. This means that you may see that its metric value fluctuates between 1 and 1500 m, or 1 and 1.5 when written in decimal notation.
Other possible scenarios
Create the autoscaler
declaratively Instead of using the kubectl autoscale command to create a HorizontalPodAutoscaler imperatively, we can use the following manifest to create it declaratively:Next, create
by running the following command:horizontalpodautoscaler.autoscaling