On Linkerd and Kubernetes jobs

Cover image

How to properly shut down a meshed pod controlled by a job

At Ingrid we use Kubernetes. Most of the things we do (at least on back-end), are deployed as Kubernetes (micro) services, communicating via either gRPC or Google Cloud Pub/Sub.

We also have a couple of daily jobs, which fetch fresh list of pickup points from carrier (say: DHL or PostNord Logistics) and store this list in our cache. You can see these pickup points (and also select one of them) in our checkout widget. This is implemented using Kubernetes cron jobs.

Enter: Linkerdpermalink

Recently we have started meshing our pods using Linkerd. While for the most of the time the process has been very smooth, properly meshing cronjobs required a little of extra work.

For the regular services, it was enough to annotate the deployment:

spec:
template:
metadata:
annotations:
linkerd.io/inject: enabled

Based on this annotation, Linkerd injects its proxy container into our deployments. So our pickup point cache service changed from this:

> kubectl get pods | grep pickup
service-pickup-point-65ff8dd97d-dx94n 1/1 Running 0 3d8h
service-pickup-point-65ff8dd97d-t8pdp 1/1 Running 0 3d8h

to this:

> kubectl get pods | grep pickup
service-pickup-point-66c9cd96c8-k5c2g 2/2 Running 0 11h
service-pickup-point-66c9cd96c8-kwqm9 2/2 Running 0 11h

Everything worked pretty well. Service mesh properly load balanced the requests and encrypted connections to other meshed services.

(Cron)jobs and Linkerdpermalink

We tried the same approach with cronjobs. We added the same annotation. The job run and completed just fine. The problem was that Kubernetes would not delete the pod:

> kubectl get pods | grep worker | grep pnl
worker-pickup-points-1625455800-7ssls 1/2 NotReady 0 13h

Upon inspection, we saw that our main container completed all right, but linkerd-proxy was still running:

> kubectl get pods -o json worker-pickup-points-1625455800-7ssls | \
jq '.status.containerStatuses[] | {name, state}'

{
"name": "linkerd-proxy",
"state": {
"running": {
"startedAt": "2021-07-05T16:41:10Z"
}
}
}
{
"name": "worker-pickup-points",
"state": {
"terminated": {
"exitCode": 0,
"finishedAt": "2021-07-05T16:41:28Z",
"reason": "Completed",
"startedAt": "2021-07-05T16:41:10Z"
}
}
}

Pod termination in Kubernetespermalink

What happens if I type kubectl delete service-pickup-point-66c9cd96c8-k5c2g? Kubernetes will send TERM signal to process 1 inside each container. Processes have 30 seconds to perform graceful shutdown (this period can be modified using terminationGracePeriodSeconds setting). After this time, if the process is still running, it will receive SIGKILL signal.

Something similar happens, when Kubernetes decides to kill a pod for another reason: deployment update or scale down triggered by horizontal pod autoscaler.

Of course, there are a few conditions and settings which can make this flow a little more complicated. You might want to read the offical docs.

In a "simple scenario", a well-behaved pod controlled by deployment will start, do it's work (handle the requests) and shutdown (quickly and gracefully) on TERM singal.

Jobs are a little different. Only the job knows, when it is finished. In the "happy scenario", the main process will terminate by itself and return code 0.

The situation gets tricky, if a pod has two containers: the "regular one" and the sidecar proxy. The regular one will finish. But what about the sidecar proxy? Why should it finish? How could it know that the "main" container has finished? Well, it can't know that and this is why it won't finish.

This is why Kubernetes does not see the pod as completed. One of the containers is still running.

The solutionpermalink

The proxy container needs to be terminated. This needs to happen "just" after the main container completes. Maybe Kubernetes can do it for us?

Unfortunately, no. As of the time of writing (2021-07-05) there is no way to inform Kubernetes about the dependencies between the containers. Here is an issue which tracks this problem. And here is another.

This is a problem "with Kubernetes" (more like a missing feature than a bug), which needs to be solved by each service mesh.

You need to manually close the sidecar container, from the "main container". In Linkerd you need to send POST request to /shutdown endpoint of the proxy. In istio you call /quitquitquit.

In short, the solution is to kill the proxy using:

curl  -XPOST localhost:4191/shutdown

The implementation

At Ingrid, we have an internal library called upstart, which we use as a boilerplate for starting/closing the pods. Most of our workers look like this:

func main() {
config, closer, err := upstart.Init()
if err != nil {
log.Fatal(err)
}
defer closer()
if err := doWork(config); err != nil {
log.Error(err)
}
}

What we needed to do, was to add a new feature to our internal library upstart. The feature was to send close signal to Linkderd:

package upstart

@@ -91,6 +96,12 @@ func Process(appName, build string, cfg Config) (Result, Closer, error) {
closer := func() {
if tracecloser != nil {
tracecloser.Close()
}
log.Logger.Closer()
+ // cfg.LinkerdForceShutdown and cfg.LinkerdForceShutdown are automatically read
+ // from ENV in init.
+ if cfg.LinkerdForceShutdown {
+ // shudownLinkerdProxy sends POST to /shudown
+ shudownLinkerdProxy(cfg.LinkerdForceShutdown)
+ }
}
// closer should be called in `defer` in `main.go`
return cfg, closer, nil

What remained was to update the library version in each worker and adjust env in our cronjob yaml files.

A bugpermalink

The first implementation of this change had a little nasty bug. Is it easy to spot?

package upstart

@@ -91,6 +96,12 @@ func Process(appName, build string, cfg Config) (Result, Closer, error) {
closer := func() {
if tracecloser != nil {
tracecloser.Close()
}
+ // cfg.LinkerdForceShutdown and cfg.LinkerdForceShutdown are automatically read
+ // from ENV in init.
+ if cfg.LinkerdForceShutdown {
+ // shudownLinkerdProxy sends POST to /shudown
+ shudownLinkerdProxy(cfg.LinkerdForceShutdown)
+ }
log.Logger.Closer()
}
return cfg, closer, nil

In this version, if the log closer needs to call an external service, it will fail. Once Linkerd is closed, no external connections are possible. If logger needs to report outstanding errors to Sentry, it will fail.

An alternative solutionpermalink

Actually, Linkerd provides a solution, where you don't need to modify your code. There is linkerd-await: a small program which you can add to your Dockerfile. It will control your application and "send a message" to Linkerd proxy, when your application finishes.

It is basically the same idea as above (or to be clear: our solution is an adaptation of linkerd-await), but you put it into Dockerfile instead of code. This has a few advantages:

  • you will not have bugs with cleanup order, as one described above,
  • this is language agnostic, you don't need to implement Linkerd closer for each language,
  • it is already implemented (as opposed to code you would need to write).

At Ingrid, we have an internal Docker template, common to all services implemented in go. Because we first evaluated Linkerd as proof-of-concept, making changes to a library felt less intrusive than changing this Dockerfile. Still, we might rethink things in the future.


Cover photo by @finalhugh on Unsplash

Does Ingrid sound like an interesting place to work at? We are always looking for good people! Check out our open positions