Kubernetes Horizontal Pod Autoscaler
In this post I will talk about the Kubernetes Horizontal Pod Autoscaler.
I will go over the concepts behind it and why we might want to use it.
Table of Contents
What is the Horizontal Pod Autoscaler?
The Horizontal Pod Autoscaler (HPA) is a resource which targets a specific workload,
like a Deployment
or a StatefulSet
.
Its responsibility is to scale the target resource (scaleTargetRef
) out in response to an increased in demand.
And to scale the target resource back in with a drop in load.
The HPA has the ability to increase & decrease the desired-replicas
of a workload.
In other words, the HPA can ask the workload to change the number of pods which are deployed.
One thing to note is that, K8s implements the HPA as a control loop as opposed to a continuous process.
So this check is performed by default every 15s.
Although this is configurable via the --scan-interval
flag.
What is the heuristic used by the HPA?
The HPA keeps track of a given metric for the target resource. This is how it decides whether to scale out or in. Fundamentally, the HPA applies the following calculation when making this decision:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
This desiredReplicas
value is used to drive the number of the workload replicas required.
See the K8s docs
for further detail on this algorithm.
Out of the box, K8s gives us the ability to monitor the CPU
or memory
as the basis of the desiredMetricValue
for the HPA.
The HPA controller gets these scraped metrics from the resource metrics API (for CPU or memory). Alternatively, we can point the HPA controller at a custom metrics API to open up the possibility of using different types of metrics which are not supported out of the box by K8s. This would be useful for situations whereby CPU or memory may not be suitable triggers to base our scaling on. The first example that might come to mind is if we are configuring the horizontal scaling of a worker/consumer component, in that case we might be more inclined to monitor the number of messages on the queue which is being subscribed to.
Scaling policies
Scaling policies can be added to an HPA exclusively and separately to the scaleUp
or scaleDown
actions.
In this case, the scaling algorithm is applied to each of the individual metrics
and the largest result (desiredReplicas
) is selected and passed to the controller for the scaling operation.
When multiple policies are specified, the policy which is bringing about the highest change
(the largest desiredReplicas
) is the policy which is selected and passed to the controller for the scaling operation.
In this example from the
K8s docs:
behavior:
scaleDown:
policies:
- type: Pods
value: 4
periodSeconds: 60
- type: Percent
value: 10
periodSeconds: 60
Here the 1st policy allows at a maximum of 4 replicas to be scaled down in 1 minute. Whereas the 2nd policy allows at most 10% of the replicas to be scaled down in a 1 minute window.
The actual result of this can introduce quite a lot of complexity due to the various permutations in which the policies would kick in, but can be useful if you need granularity.
Thrashing
In my post about the Kubernetes Cluster Autoscaler, I spoke briefly about the phenomenon of thrashing/flapping. When we apply our algorithm we need to ensure that we do not allow opposing scaling actions to be applied to our workload in quick succession. This would result in the HPA seemingly continuously scaling our workload in and out.
This can also happen when the resource metric is changing at a high rate.
For example if the metric spikes and dips around the trigger value over a short period of time.
The stabilizationWindowSeconds
setting is used to tell a HPA to check against previously computed desired states
in the window specified to decide whether to perform a scaling action.
Demo | setting up the deployment
So, we’ve got enough theory to back up our understanding of the HPA. Now let’s take a look at what this looks like in action. Please note that the following exercise is purely for demonstration purposes. In practice, we would not want to deploy our HPA in this imperative manner.
~ via v14.21.1 on ☁️ (eu-west-1)
❯ kubectl create namespace hpa-testing
namespace/hpa-testing created
With this command I will create a namespace for our demo. A namespace is an isolated environment which we can use to separate our resources from others within the scope of the same cluster.
~ via v14.21.1 on ☁️ (eu-west-1)
❯ helm install --set 'resources.limits.cpu=200m' \
--set 'resources.limits.memory=200mi' \
--set 'resources.requests.cpu=200m' \
--set 'resources.requests.memory=200mi' \
testing-app bitnami/apache -n hpa-testing
NAME: testing-app
LAST DEPLOYED: Fri Dec 2 20:06:43 2022
NAMESPACE: hpa-testing
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: apache
CHART VERSION: 9.2.7
APP VERSION: 2.4.54
** Please be patient while the chart is being deployed **
1. Get the Apache URL by running:
** Please ensure an external IP is associated to the my-release-apache service before proceeding **
** Watch the status using: kubectl get svc --namespace hpa-testing -w my-release-apache **
export SERVICE_IP=$(kubectl get svc --namespace hpa-testing my-release-apache --template "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}")
echo URL : http://$SERVICE_IP/
WARNING: You did not provide a custom web application. Apache will be deployed with a default page. Check the README section "Deploying your custom web application" in https://github.com/bitnami/charts/blob/main/bitnami/apache/README.md#deploying-a-custom-web-application.
Here I have installed a Helm chart for an Apache server to the namespace we previously created as hpa-testing
.
We will need to get our Apache application pointed to a URL for the purposes of this demonstration.
So following the steps described above:
export SERVICE_IP=$(kubectl get svc --namespace hpa-testing my-release-apache --template "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}")
echo URL : http://$SERVICE_IP/
We will need to make a note of that URL
value as we will need it soon.
~ via v14.21.1 on ☁️ (eu-west-1)
❯ kubectl get pods -n hpa-testing
NAME READY STATUS RESTARTS AGE
testing-app-apache-87fddb6df-j75bt 0/1 Running 0 41s
The only thing to really note here is that we now have an application running.
It’s not really important that it’s an Apache application.
What is important is that we will use this as our target workload to scale in response to load.
~ via v14.21.1 on ☁️ (eu-west-1)
❯ kubectl get deployments -n hpa-testing
NAME READY UP-TO-DATE AVAILABLE AGE
testing-app-apache 0/1 1 0 45s
Checking our Deployment
resource, we can see that the name has been set to testing-app-apache
.
We’ll use this name to set the target for our HPA.
For your HPA to scrape data from our metrics server we will need to ensure
that we have a metrics-server
running within our cluster.
Running the following command, should give an output that looks something like the following:
~ via v14.21.1 on ☁️ (eu-west-1)
❯ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-172-22-106-126.eu-west-1.compute.internal 204m 5% 3852Mi 25%
ip-172-22-114-103.eu-west-1.compute.internal 165m 4% 3556Mi 23%
ip-172-22-42-85.eu-west-1.compute.internal 311m 7% 3007Mi 20%
ip-172-22-63-218.eu-west-1.compute.internal 285m 7% 3179Mi 21%
ip-172-22-73-69.eu-west-1.compute.internal 194m 4% 3466Mi 23%
ip-172-22-80-208.eu-west-1.compute.internal 243m 6% 4647Mi 31%
ip-172-22-92-218.eu-west-1.compute.internal 296m 7% 4162Mi 28%
If we get a result back, then we can proceed with relative confidence knowing that we have a metrics-server
running.
Demo | Attaching an HPA to our deployment
Now, let’s get our HPA running and pointed at our application:
~ via v14.21.1 on ☁️ default
❯ kubectl autoscale deployment testing-app-apache --memory-percent=5 --min=1 --max=10 -n hpa-testing
horizontalpodautoscaler.autoscaling/testing-app-apache autoscaled
Here we’ve imperatively created a HPA with the following conditions:
- When the average memory utilisation across all the pods goes above or below 50%, trigger a scaling action.
- We want at least 1 replica running at all times.
- No matter how high our load gets, we only allow a maximum of 10 replicas running.
The first point is important to note here. If we scaled out to have 2 pods in the following scenario: | Replica | Memory utilization (%) | | ———– | ———– | | Pod A | 90 | | Pod B | 30 |
Then we can say that our average memory utilisation is 60%. This exceeds our trigger of 50%.
So in this scenario, our HPA would kick in to gear and increase our number of replicas to 3.
Demo | Triggering the HPA to scale out
In order to trigger the HPA to do its thing, we will need to create artificial load against our application.
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.0000001; do wget -q -O- http://$SERVICE_IP/; done"
Here we can see as the load on our application increases, resource utilisation also increases.
The HPA control loop comes back around and the HPA calculates that desiredReplicas
to be greater than
the current replicas, thereby triggering a scale-out action.
Demo | Scale in
When we exit from the load-generator
command, we terminate the artificial load that we had previously created.
After this point, the average memory utilisation across the pods will drop below our trigger value. The HPA will then intervene by driving the desired state with a reduced number of replicas.
Summary
As mentioned previously the imperative approach we took in deploying our components for this demo is not ideal for a number of reasons, and it is highly unlikely we would do this in production.
In reality, we would adopt a declarative methodology.
Whereby we would describe the resources that we want k8s to deploy for us in the form of yaml manifests.
We would want to keep this under source control.
And if we were also following a GitOps approach,
we want to ensure that the manifests under version source control always represent the current state of our deployments.
Which as you might have guessed is not what this demo shows!
Related posts: