Autoscale your kubernetes pods
Hello everyone, in this blog I will share one of the ways to auto scale your pods based on RabbitMQ message count.
To auto scale pods in Kubernetes(k8s), You have multiple ways to do it.
- Manually scale up pods
- Based on CPU/Memory Consumption
- Based on your queue size (e.g., AWS SQS, RabbitMQ etc.)
Manually scale up pods
This one of the very simple ways, where you manually monitor the load on your application and then do scaling using kubectl command.
kubectl scale --replicas=3 deployment/demo-deployment
This gives you more control over scaling decision but its very time consuming for DevOps/Monitoring team.
There also exists one variation of this, where you could setup the cronjob which executes and scales your pod, of course this works if you already know when heavy load will come.
Based on CPU/Memory Consumption
This is one of the advanced versions of previous step, where you don’t have to manually scaleup/scaledown your pods. Kubernetes out of box provides HPA (Horizontal Pod Scaler), which allows you to scale up or scale down your pods based on memory/cpu consumption. In HPA, you can define the threshold at which you want to spin up a new pod on k8s, rest is taken care by the k8s.
One of the advantage of this is, you don’t have to worry about the application architecture, if you have defined your autoscaling based on these metrics with HPA, it could be web/background processor etc. everything is taken care by the HPA.
Based on your queue size
There exists some scenario where even HPA will not be able to give the performance you needs with your autoscaling. In such cases, you have to look into very specific implementation. Here I will discuss about queue, It could be any queue RabbitMQ, AWS SQS, Azure ServiceBus etc.
There exists multiple solutions, but the best one that I have found is Keda ( CNCF project). Keda allows you to define triggers on which your autoscaling will work and also define your target deployment.
I have given examples below which Increases the nginx
deployment to 4(max 4) based on queue length. This is very usefull where you want to process your message simultaneously without first reaching the CPU/Memory limit, hence this increases your overall performance.
One use case would be where you are processing very heavy file (Excel, CSV Processing, Image/Video Processing) and these processing should happen very quickly for concurrent request.
Just FYI, Keda does this using custom controller and using the Kubernetes scale API(shown in method 1).
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
generation: 3
labels:
scaledobject.keda.sh/name: nginx-scaledobject
name: nginx-scaledobject
namespace: dev
spec:
maxReplicaCount: 4
minReplicaCount: 1
scaleTargetRef:
name: nginx
triggers:
- metadata:
activationValue: "1"
host: amqp://guest:[email protected]:31001
mode: QueueLength
protocol: amqp
queueName: Test_Keda
value: "2"
vhostName: /
type: rabbitmq
Thanks for reading…