Health based traffic control with Kubernetes

Last time we covered how the liveness probe can be integrated with Spring Boot Actuator. Today, I’m going to show an example application for the readiness probe.

Readiness probe

The readiness probe is kind of similar to the liveness probe. It determines if the application running is allowed to serve traffic. Think about the case when the application starts up – so the liveness probe says, it’s all good – but until it can really respond to requests, it has to process a huge file, fill up the caches from the database or contact an external service. In this case you don’t want the application to be restarted by the liveness probe but wait until it’s fully operational.

Another scenario when the app is having some background processing responsibilities as well on top of a normal HTTP API. If it gets overloaded with the background work, it might not have enough resources to reply to HTTP requests, at least in case response time is crucial. With the readiness probe, you can have such functionality implemented so in case the application is lacking the necessary resources, no traffic will be sent to it until it frees up.

Configuring this type of probe is almost identical to other probes, the only difference is the name readinessProbe.

I’m not going to go through all the settings for the probes, the same can be applied just like for a liveness probe.

Example – startup

I’m going to extend the example I’ve shown in the previous article so if you are out of context, make sure you check it here.

Moving back to writing code. We are going to simulate when the application has to load something at startup that takes several seconds.

First of all, we need a state holder whether the application is ready to serve traffic or not.

Now the startup load simulation. I’m going to use the TaskExecutor interface from Spring to execute an asynchronous task that will set the isReady attribute to true after 20 seconds. The implementation looks like this:

In the task, I’m doing a simple log message so we can verify the logs in Kubernetes. Then sleeping the thread for 20 seconds and after that setting the isReady state to true.

Next up, we need to expose this information on HTTP. I’m creating a new controller:

A single GET endpoint that gets the data from the holder and depending on the value, it responds with either HTTP 200

or HTTP 503

Now the deployment descriptor:

There are 2 changes I’ve made compared to the previous article. On one hand I’ve added the readinessProbe so it’s mapped to the /ready endpoint we created. The other one is the Service descriptor, I’ve changed it to NodePort so it’s easier to access for the test. You can use the original descriptor if you want though. The NodePort only means that the API can be accessed through the Kubernetes node directly. For minikube, you can use minikube ip to get the address and then http://<ip>:31704 will be the root of the API.

Next up, let’s deploy the application. Usual exercise, building the jar, then the image and applying the Kubernetes descriptor. Don’t forget to execute eval $(minikube docker-env) if you are using minikube.

Observing the running pods:

The -w flag watches for changes. Inspecting the output:

It’s clearly visible that after 20 seconds, the application suddenly changed it’s ready state, just like we implemented it. During the period of the pod not being ready, no request will be served. So if you try to execute for example the following command during startup:

As soon as the readiness probe says, the pod is ready, executing the same command will result in a proper response:

That’s it. The readiness probe is working properly and it doesn’t let traffic go to the pod until it’s reported healthy.

Example – background processing

The other application for the readiness probe is when the application is running low on resources. Like if background processing is part of the application that uses threads in a threadpool. If it gets overloaded, the resources might not be sufficient to serve HTTP requests in an acceptable manner.

I hope you didn’t expect me to give you a full-blown background processing engine that will starve the compute power needed for an HTTP API. Rather I’m just going to emulate the insufficient resource state by setting a flag.

Compared to the previous example, we are making a single change for now. Exposing an HTTP API to switch the ready flag manually.

The new holder class with the switchReady method:

And the controller with the new /readyswitch API:

Building the application again and deploying. After the initial readiness probe lets traffic to the pod, we can simply switch the readiness flag so Kubernetes will stop forwarding requests to the pod.

Verifying the API works after startup:

Switching the flag:

Verifying the API doesn’t respond anymore:

And Kubernetes is showing the pod as not ready:

Of course switching it back through the exposed port is not possible anymore as Kubernetes stopped sending HTTP traffic to the pod. We can still exec into the container though and switch the flag back:

Exiting from the inside container and checking what Kubernetes thinks about the pod:

Now it’s back to operation and traffic is allowed.

The real benefit kicks in when you are running the application in multiple instances. To demonstrate this, let’s create a dummy endpoint that logs a single message.

Making the application run in 2 instances needs a little bit of tweak (replicas attribute):

Open 3 terminals, 2 for monitoring the application logs for the 2 instances and one for executing the requests.

For watching the logs continuously:

Execute this command against the 2 pods you have. Then call the /dummy API from the 3rd terminal.

Now Kubernetes is balancing the requests between the 2 replicas as they are both ready. You can see it in the logs as well:

Sometimes the first pod is serving the request, sometimes the other.

And now the most exciting part, if we switch one of the pods to not be ready.

Triggering the dummy API will always be served from the ready pod:

If you switch back the non-ready pod, it will continue responding back to requests.

Conclusion

We’ve checked 2 scenarios when the readiness probe is useful. The whole purpose of healthchecks is to create more resilient applications and I can encourage you to invest some time into doing it properly. It will definitely return the investment.

As usual, the code can be found on GitHub. If you liked the article, give it a thumbs up and share it. If you are interested in more, make sure you follow me on Twitter.

Leave a Reply

Your email address will not be published. Required fields are marked *