You’ll see a lot of advice that says that docker containers should ideally only have a single process, and that you should strive for docker PID 1. I enjoy the aspiration, but there’s no way that this works for most legacy applications. It works great when you have a single statically linked binary like if you’re compiling the Golang stack. But what about if you’re trying to deploy a Python + Django stack? Django itself just does NOT come with a production web server. If you look at Django’s own deployment documentation, it tells you to run it with another program. Programs like gunicorn work by spawning multiple processes to run your application.
PID 1 purists crying in the club rn.
Why do I want to run multiple processes anyways
It’s pretty common for multi-node GPU training frameworks to communicate over something like ssh. If you try to do multi-node training with DeepSpeed, you need to configure a hostfile
and then configure passwordless SSH between all of the nodes. OpenMPI does the same.
In non-containerized environments, you might use a process manager like systemd or supervisord. Docker actually documents examples of using supervisord for managing multiple processes. Which, for all intents and purposes, does exactly what we want, EXCEPT an organic way to exit when a process exits. Process managers tend to want to restart any process that have exited. Even with supervisord, you can configure autorestart = false
but supervisord itself will not exit, meaning that your container itself will not exit.
Why does exiting matter
ML training jobs are not web services. A web service is intended to run forever, and will only be shutdown when it needs to be replaced. That makes sense. In theory, your website should live forever. ML training jobs are long running batch jobs. It’s fully expected that they will eventually finish running. They might run for hours, days, weeks, months or longer, but eventually you’ve gone through all of your training data and you want your process to exit. The function will eventually exit. How do we do this in a container?
s6 process supervision
S6 describes itself as:
s6 is a collection of utilities revolving around process supervision and management, logging, and system initialization.
Fascinating.
On top of this, I’ve been recently made aware about s6-overlay which describes itself as:
s6-overlay is an easy-to-install (just extract a tarball or two!) set of scripts and utilities allowing you to use existing Docker images while using s6 as a pid 1 for your container and process supervisor for your services.
Fascinating… again. This would give us exactly what we want. The ability to run sshd as a long running process while still running our own application. Process signals will be managed for us, all in a relatively straightforward process manager.
In the rest of this post, I’m going to walkthrough a minimal Dockerfile that uses s6-overlay and explain the pieces of it, as I understand them.
In order to make the Dockerfile easily reproducible, we’re going to embed the configuration directly into the Dockerfile so that we don’t have to worry about setting up other files locally.
FROM ubuntu:22.04
# Install sshd and xz to unzip the s6-overlay tarball
RUN <<EOF
apt-get update;
apt-get install -yq openssh-server xz-utils;
EOF
# Install s6-overlay
ADD https://github.com/just-containers/s6-overlay/releases/download/v3.1.6.2/s6-overlay-noarch.tar.xz /tmp
RUN tar -C / -Jxpf /tmp/s6-overlay-noarch.tar.xz
ADD https://github.com/just-containers/s6-overlay/releases/download/v3.1.6.2/s6-overlay-x86_64.tar.xz /tmp
RUN tar -C / -Jxpf /tmp/s6-overlay-x86_64.tar.xz
# Define sshd as a long running s6 service
COPY <<EOF /etc/s6-overlay/s6-rc.d/sshd/type
longrun
EOF
# Define entrypoint for sshd
# Create sshd's required directory
# Daemon applications tend to log to stderr, so redirect it to stdout
# Start sshd in the foreground
COPY --chmod=700 <<EOF /etc/s6-overlay/s6-rc.d/sshd/run
#!/bin/sh
mkdir -p /var/run/sshd
/usr/sbin/sshd -D -e
EOF
# Register sshd as a service for s6 to manage
RUN touch /etc/s6-overlay/s6-rc.d/user/contents.d/sshd
# Copy my ssh public keys to the container
ADD https://github.com/abatilo.keys /root/.ssh/authorized_keys
# Define training as a one shot s6 service
WORKDIR /app
COPY --chmod=700 <<EOF /app/training.sh
#!/bin/sh
echo "Training started"
sleep 5
echo "Training finished"
EOF
ENTRYPOINT ["/init"]
CMD ["/app/training.sh"]
Let’s break it down by section:
FROM ubuntu:22.04
# Install sshd and xz to unzip the s6-overlay tarball
RUN <<EOF
apt-get update;
apt-get install -yq openssh-server xz-utils;
EOF
# Install s6-overlay
ADD https://github.com/just-containers/s6-overlay/releases/download/v3.1.6.2/s6-overlay-noarch.tar.xz /tmp
RUN tar -C / -Jxpf /tmp/s6-overlay-noarch.tar.xz
ADD https://github.com/just-containers/s6-overlay/releases/download/v3.1.6.2/s6-overlay-x86_64.tar.xz /tmp
RUN tar -C / -Jxpf /tmp/s6-overlay-x86_64.tar.xz
Installing s6-overlay is incredibly easy. I think it’s relatively unknown that Docker’s ADD
directive has the ability to download from the internet, so you don’t even need to have curl or wget installed. ADD
will even unxz for you, and so all you need to do from that point is to untar the s6 utilities into the rest of the filesystem. ezpz.
# Define sshd as a long running s6 service
COPY <<EOF /etc/s6-overlay/s6-rc.d/sshd/type
longrun
EOF
# Define entrypoint for sshd
# Create sshd's required directory
# Daemon applications tend to log to stderr, so redirect it to stdout
# Start sshd in the foreground
COPY --chmod=700 <<EOF /etc/s6-overlay/s6-rc.d/sshd/run
#!/bin/sh
mkdir -p /var/run/sshd
/usr/sbin/sshd -D -e
EOF
# Register sshd as a service for s6 to manage
RUN touch /etc/s6-overlay/s6-rc.d/user/contents.d/sshd
Okay, here’s the meat of using s6. The /etc/s6-overlay/s6-rc.d
directory is where you can specify all of your services. By creating the sshd directory, that’s essentially declaring that you have a service that you want to run. The first file here is a file named type
where you can define either a longrun
service or a oneshot
service. For sshd, that’s a long running service that I expect to never need to exit. Next, we have to create this run
file in the sshd
directory. This shell script is creating the required directory for sshd, and then runs sshd NOT as a daemon (via the -D flag, which I think feels backwards). The last line here is to create an empty file in /etc/s6-overlay/s6-rc.d/user/contents.d/
which is how you actually register what services you want s6 to execute. The name of the file needs to match the name you used in the service configuration.
# Copy my ssh public keys to the container
ADD https://github.com/abatilo.keys /root/.ssh/authorized_keys
# Define training as a one shot s6 service
WORKDIR /app
COPY --chmod=700 <<EOF /app/training.sh
#!/bin/sh
echo "Training started"
sleep 5
echo "Training finished"
EOF
ENTRYPOINT ["/init"]
CMD ["/app/training.sh"]
Lastly, here’s my quick way for me to add some public keys to my server by fetching them from GitHub. and then here’s my mock bash script that would initiate the actual training code.
The last section here is perhaps the most fascinating. We’re combining both ENTRYPOINT
and CMD
. The ENTRYPOINT
invokes s6 itself with /init
and then setting the CMD
let’s us specify the entrypoint to our main service. Doing this means that we will actually instantiate s6 as our PID 1, and s6 will take the CMD
as the service that we want to run. S6 will instantiate all of the services that we’ve defined, which in our case is only sshd, and once that’s running, will instantiate our main service. Once the training process exits, s6 will notice this and will send shutdown signals to all other services under its management, allowing for a clean exit. S6 will use the exit code of our main service as its overall exit code, so if something fails unexpectedly in the training process, we’ll be able to catch it in our upstream container orchestration layer.
Try it yourself with this Dockerfile:
docker build -t s6 https://gist.githubusercontent.com/abatilo/2b0c99d8536476f0219de2b8cc038870/raw/c942a1269d8c9a18e70e540c79a3a26750f56352/Dockerfile && docker run -p 2222:22 --rm -it s6
Thanks for reading.