46  How-to fix broken Docker-in-Docker socket

When using Docker-in-Docker, there is a chance that dind hasn’t started when a build is requested. If this happens, the volume mount to load /var/run/dind/docker.sock into the build container may occur before dind has created the socket. If this happens, the volume mount will create a directory at the mount point (which we don’t want to happen). If this happens, Docker-in-Docker will be inaccessible until /var/run/dind is manually deleted and the dind pod is restarted.

46.1 Spotting the problem

Build pods will not be working, and the dind pods are stuck in CrashLoopBackoff.

46.2 Band aiding the problem

46.2.1 Bots

We implemented a bot to monitor the issue and the source code is available at https://github.com/gesiscss/orc2/blob/main/ansible/usr/bin/orc2-fix-dind-bot.py.

46.2.2 OpenLens

Note

For an introduction to use OpenLens, read Chapter 37.

  1. Open OpenLens and connect to the cluster.

    Screenshot of OpenLens showing cluster dashboard.
  2. In the navigation bar on the left, click on Workloads and Pods.

    Screenshot of OpenLens listing pods.
  3. Search for the binderhub-dind- pod that has many restarts. Click in the node name for the binderhub-dind- pod of interest to open the node details.

    Screenshot of OpenLens listing pods and showing node details.
  4. On the node details navigation bar at the right top corner, click on the first icon (Node shell).

    Screenshot of OpenLens listing pods and showing node terminal.
  5. OpenLens opened a terminal as root user at the node. Execute

    rm -rf /var/run/dind/docker.sock/
  6. Select the binderhub-dind- and binderhub-image-cleaner- pods.

    Screenshot of OpenLens listing pods with some selected.
  7. Remove the selected pods by clicking the minus button at the bottom right corner of the list of pods.

46.3 Fixing the problem

No fix is available.