TL;DR

If you are using a Persistent Volume Claim (PVC) and have the following issue:

Multi-Attach error for volume "pvc-X" Volume is already used by pod(s) Y

Changing your deployment from:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 1
  template:
  ...

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 1
  strategy:
    type: Recreate # <-- That's the important part
  template:        # Not sure YAML accepts EOL comments...
  ...              # meh.

will fix your issue. Read on for more deets.

Background

I’ve been learning kubernetes on the job lately, it’s a rough ride… I’m hoping to write my experience with this beast sometime in the future.

I’ve had for the past months and did not find an proper workaround for it besides just trying over and over until it fixes itself… And today I finally found a proper fix for it so I’m sharing this here so that:

I can remember wtf I did
Other people encountering this issue could maybe stumble on this obscure part of the internet which is this page

The setup

I have a deployment which has a Persistent Volume Claim (PVC from now on) so that my service can store some persisting data.

PVC have different access modes, to cite the documentation:

ReadWriteOnce – the volume can be mounted as read-write by a single node

ReadOnlyMany – the volume can be mounted read-only by many nodes

ReadWriteMany – the volume can be mounted as read-write by many nodes

DigitalOcean (where my shit is hosted) only offers ReadWriteOnce PVCs.

Where it gets interesting

As is fairly common, I have a Service with it’s corresponding Deployment, this Deployment has a PVC to store some persistent data (which would otherwise be lost each time the pod is deleted, and pods should always be considered disposable).

And if your setup is similar, you’ve probably already seen the dreaded: Multi-Attach error for volume "pvc-X" Volume is already used by pod(s) Y. Preventing the new version of your deployment to be deployed.

The issue is that by default, Kubernetes will use what’s called a RollingUpdate. And in most cases it’s exactly what you want, Kube will only kill v1 of your deployment once the new v2 version is Ready to accept incoming traffic (maybe the deployment needs to bootstrap some data, load a bunch of files, connect to a DB, etc.).

This allows your app to have virtually no downtimes as you rollout newer versions of your code.

This does not work in my case however, because for the v2 pod to be ready it needs to have the PVC attached to its pod; but since that PVC is already attached to the v1 pod this can never happen, resulting in an infinite deadlock.

Enter Recreate

A more brutal but necessary-in-this-setup method is to use the Recreate DeploymentStrategy. This will immediately kill the old version of your deployment, freeing the PVC so that the newer version can spawn a pod.

Closing thoughts

Writing this it really feels like it should not have taken me this long to figure a solution to this issue. Googling around did not yield an immediate answer, and I stumbled upon the workaround by reading a random suggestion from a random dude on a random github issue.

Maybe if I had just RTFM…..