systemd Shutdown Units

Designing a system to shutdown gracefully can be tricky. In an ideal world, every service would be managed by a systemd unit. ExecStart would start a process that handles SIGTERM by stopping itself and an ExecStop would inform the process and block to gracefully stop the process and its resources.

But not all software stops gracefully or does a full teardown of what it set up. In this post, we’ll look at systemd’s shutdown behavior and strategies for writing systemd units that perform custom cleanup tasks before shutdown.

systemd

As the init system, systemd manages services from start to stop (among many other duties). It’s the first process to start on boot and the last process to stop on shutdown. Unlike the sequential scripts that came before, systemd services are oriented around systemd units with dependency and ordering relationships that allow many services to start (or stop) in parallel. These design details will loom large in our discussion:

  • Services start (and stop) in parallel (unless otherwise ordered)
  • Processes are terminated via SIGTERM, or SIGKILL after a timeout (unless otherwise configured)
  • On shutdown, services with an ordering dependency stop in the inverse start-up order

Shutdown

Let’s review what happens on shutdown. Several systemctl subcommands (below) can shutdown a system by activating special systemd units reboot.target, poweroff.target, and halt.target..

systemctl halt      # Shut down and halt the system
systemctl poweroff  # Shut down and power-off the system
systemctl reboot    # Shut down and reboot the system

These target units’ Requires pull in dependencies like systemd-reboot.service, systemd-poweroff.service, and systemd-halt.service (respectively), which all recursively require shutdown.target.

# reboot.target
[Unit]
Description=System Reboot
Documentation=man:systemd.special(7)
DefaultDependencies=no
Requires=systemd-reboot.service
After=systemd-reboot.service
AllowIsolate=yes
JobTimeoutSec=30min
JobTimeoutAction=reboot-force

[Install]
Alias=ctrl-alt-del.target
sudo systemctl list-dependencies --all --recursive reboot.target
reboot.target
○ └─systemd-reboot.service
●   ├─system.slice
●   │ └─-.slice
○   ├─final.target
○   ├─shutdown.target
○   └─umount.target

By default, all units and scopes have DefaultDependencies=yes, which implicitly adds Before=shutdown.target and Conflicts=shutdown.target to them. Conflicts means starting shutdown.target stops conflicting units. Starting shutdown.target will stop all conflicting units in parallel (unless otherwise ordered).

When stopped, systemd units should gracefully stop running processes, free resources, and wait until that completes. A load balancer might stop accepting new connections and disable its readiness endpoint. A database might flush to disk. An agent might inform a cluster it’s leaving the group. Any processes remaining after ExecStop runs will be non-gracefully (i.e. violently) SIGKILL’ed by systemd (unless otherwise configured).

Not all software stops gracefully or does a full teardown of what it setup. Some “manager” services deviate from the systemd’s model. Some teardown tasks require coodination with a cluster system. Tools like systemd-analyze don’t help with shutdowns. And many services may not even be your own software. In these cases, cleanup actions in early shutdown units can help. Let’s look at some strategies for writing systemd units that perform custom cleanup actions before shutdown.

Cleanup Script

Start with a simple cleanup script that simulates a cleanup task. echo logs messages, the loop shows progress, and the sleep ensures it runs long enough to confirm subsequent actions wait. Notice, this script has no reliance on networking, containers, or other system components.

#!/bin/bash

echo "cleaning..."
for i in {1..3}; do
  sleep 5s
  echo "waiting..."
done

echo "done"

Put this script in /usr/local/bin/cleanup and make it executable. If you later see a systemd 203/EXEC error, revisit this section.

ExecStart Script

Now consider a systemd oneshot unit that runs cleanup as an ExecStart script. Its a first attempt, but as we’ll see it’s incorrect.

# /etc/systemd/system/clean.service
[Unit]
Description=Clean on shutdown
Before=shutdown.target    # implicit

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/cleanup

[Install]
WantedBy=shutdown.target

This unit makes shutdown.target depend on clean.service, uses Before=shutdown.target to order clean.service before shutdown.target and uses Type=oneshot so the service will be considered started when the script exits (to delay shutdown until cleanup finishes). RemainAfterExit is used to keep the service active even after the script has exited.

Enable the clean.service, systemctl reboot, and check the journal logs later…

Sep 28 23:55:01 ip-10-0-13-150 systemd[1]: Starting clean.service - Clean on shutdown...
Sep 28 23:55:01 ip-10-0-13-150 cleanup[6796]: cleaning...
-- Boot 8e4734a82c754e549c9a9292ca5988fb --

This approach isn’t right. You may not even see logs. The clean service may start, but it won’t delay shutdown. shutdown.target conflicts with all units, so clean.service gets stopped before it can finish. You might think of adding DefaultDependencies=no, but systemd.unit man pages tell us it won’t delay shutdown - “Given two units with any ordering dependency between them, if one unit is shut down and the other is started up, the shutdown is ordered before the start-up”

ExecStop Script

Next, consider a systemd oneshot unit that runs cleanup as an ExecStop script.

# /etc/systemd/system/clean.service
[Unit]
Description=Clean on shutdown
After=multi-user.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true
ExecStop=/usr/local/bin/cleanup

[Install]
WantedBy=multi-user.target

This unit is pulled in by multi-user.target, but ordered to start After=multi-user.target, fairly late in startup. Since ordered units are stopped in reverse start order, clean should begin stopping before other units.

Enable and start clean.service. Confirm it’s active (but exited).

● .service - Clean on shutdown
     Loaded: loaded (/etc/systemd/system/clean.service; enabled; vendor preset: disabled)
     Active: active (exited) since Thu 2022-09-29 20:21:17 UTC; 2min 49s ago
    Process: 1383 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 1383 (code=exited, status=0/SUCCESS)
        CPU: 2ms

Sep 29 20:21:17 ip-10-0-13-150 systemd[1]: Starting clean.service - Clean on shutdown...
Sep 29 20:21:17 ip-10-0-13-150 systemd[1]: Finished clean.service - Clean on shutdown.

Note the Boot ID in journal logs and then run systemctl reboot.

...
-- Boot 0e40d519972b4cd7bc09374b3072788d --
Sep 28 20:18:24 ip-10-0-13-150 systemd[1]: Starting clean.service - Clean on shutdown...
Sep 28 20:18:24 ip-10-0-13-150 systemd[1]: Finished clean.service - Clean on shutdown.

After reboot, check the logs. I’ve annotated when systemctl reboot was run 🔁.

...
Sep 28 20:18:24 ip-10-0-13-150 systemd[1]: Finished clean.service - Clean on shutdown.
🔁
Sep 29 20:20:38 ip-10-0-13-150 systemd[1]: Stopping clean.service - Clean on shutdown...
Sep 29 20:20:38 ip-10-0-13-150 cleanup[367051]: cleaning...
Sep 29 20:20:43 ip-10-0-13-150 cleanup[367077]: waiting...
Sep 29 20:20:48 ip-10-0-13-150 cleanup[367080]: waiting...
Sep 29 20:20:53 ip-10-0-13-150 cleanup[367132]: waiting...
Sep 29 20:20:53 ip-10-0-13-150 cleanup[367134]: done
Sep 29 20:20:53 ip-10-0-13-150 systemd[1]: clean.service: Deactivated successfully.
Sep 29 20:20:53 ip-10-0-13-150 systemd[1]: Stopped clean.service - Clean on shutdown.
-- Boot 8e97b43271024c64b0775c43dc519c5b --
Sep 29 20:21:17 ip-10-0-13-150 systemd[1]: Starting clean.service - Clean on shutdown...
Sep 29 20:21:17 ip-10-0-13-150 systemd[1]: Finished clean.service - Clean on shutdown.

When the unit stops, the cleanup script runs to completion, delaying shutdown. Then the reboot occurs. On the next boot, the unit starts (/bin/true) and finishes.

It works as expected here, but there’s a caveat. Ordered units are stopped in reverse start order, but remember that many services are stopped concurrently. cleanup has no guarantee what services will be up when it’s running as ExecStop. cleanup only works because it doesn’t depend on other system components. For example, if our script used networking, we’d need to add After=network.target to ensure that on shutdown, clean stops before networking stops.

Let’s move on to a containerized cleanup task, a common case these days.

ExecStop container

Consider a systemd unit that runs a container process as its ExecStop before shutdown. Unlike the minimal script used earlier, starting a container requires many system components. And ExecStop begins concurrently with other stopping units. This won’t go well.

# /etc/systemd/system/clean.service
[Unit]
Description=Clean on shutdown
After=multi-user.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/true
ExecStopPre=-/usr/bin/podman rm clean
ExecStop=/usr/bin/podman run \
  --name clean \
  --log-driver=k8s-file \
  --rm \
  -v /usr/local/bin:/scripts \
  --stop-timeout=60 \
  --entrypoint /scripts/cleanup \
  docker.io/fedora:36

[Install]
WantedBy=multi-user.target

Enable and start clean.service. When clean.service is stopped manually, ExecStop will create a container, mount /usr/local/bin, and run the same cleanup script fine.

Sep 29 21:36:41 ip-10-0-13-150 systemd[1]: Starting clean.service - Clean on shutdown...
Sep 29 21:36:41 ip-10-0-13-150 systemd[1]: Finished clean.service - Clean on shutdown.
Sep 29 21:37:35 ip-10-0-13-150 systemd[1]: Stopping clean.service - Clean on shutdown...
Sep 29 21:37:35 ip-10-0-13-150 podman[10701]: 2022-09-29 21:37:35.32232541 +0000 UTC m=+0.076846159 container create 2e0a21b085113ad6b5eab83a1f4b85081045727b711c89909dc7d204abb25e61 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 21:37:35 ip-10-0-13-150 podman[10701]: 2022-09-29 21:37:35.292987568 +0000 UTC m=+0.047508358 image pull  docker.io/fedora:36
Sep 29 21:37:35 ip-10-0-13-150 podman[10701]: 2022-09-29 21:37:35.516155501 +0000 UTC m=+0.270676209 container init 2e0a21b085113ad6b5eab83a1f4b85081045727b711c89909dc7d204abb25e61 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 21:37:35 ip-10-0-13-150 podman[10701]: 2022-09-29 21:37:35.5276157 +0000 UTC m=+0.282136425 container start 2e0a21b085113ad6b5eab83a1f4b85081045727b711c89909dc7d204abb25e61 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 21:37:35 ip-10-0-13-150 podman[10701]: 2022-09-29 21:37:35.527985526 +0000 UTC m=+0.282506267 container attach 2e0a21b085113ad6b5eab83a1f4b85081045727b711c89909dc7d204abb25e61 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 21:37:35 ip-10-0-13-150 podman[10701]: cleaning...
Sep 29 21:37:40 ip-10-0-13-150 podman[10701]: waiting...
Sep 29 21:37:45 ip-10-0-13-150 podman[10701]: waiting...
Sep 29 21:37:50 ip-10-0-13-150 podman[10701]: waiting...
Sep 29 21:37:50 ip-10-0-13-150 podman[10701]: done
Sep 29 21:37:50 ip-10-0-13-150 podman[10701]: 2022-09-29 21:37:50.536310798 +0000 UTC m=+15.290831522 container died 2e0a21b085113ad6b5eab83a1f4b85081045727b711c89909dc7d204abb25e61 (image=docker.io/library/fedora:36, name=clean, health_status=)
Sep 29 21:37:50 ip-10-0-13-150 podman[10873]: 2022-09-29 21:37:50.707597051 +0000 UTC m=+0.115808702 container remove 2e0a21b085113ad6b5eab83a1f4b85081045727b711c89909dc7d204abb25e61 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 21:37:50 ip-10-0-13-150 systemd[1]: clean.service: Deactivated successfully.
Sep 29 21:37:50 ip-10-0-13-150 systemd[1]: Stopped clean.service - Clean on shutdown.

But during a real shutdown, ExecStop won’t be able to create a container.

Sep 29 21:41:21 ip-10-0-13-150 systemd[1]: Stopping clean.service - Clean on shutdown...
Sep 29 21:41:21 ip-10-0-13-150 podman[12002]: 2022-09-29 21:41:21.263765577 +0000 UTC m=+0.192250288 container create 214a91497691b33a2ee77a0ad6dc3b3894e102abf84d641f5df2abfc842d1cd0 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 21:41:21 ip-10-0-13-150 podman[12002]: 2022-09-29 21:41:21.190543212 +0000 UTC m=+0.119027932 image pull  docker.io/fedora:36
Sep 29 21:41:21 ip-10-0-13-150 podman[12002]: time="2022-09-29T21:41:21Z" level=error msg="Unable to clean up network for container 214a91497691b33a2ee77a0ad6dc3b3894e102abf84d641f5df2abfc842d1cd0: \"error tearing down network namespace configuration for container 214a91497691b33a2ee77a0ad6dc3b3894e102abf84d641f5df2abfc842d1cd0: netavark: failed to delete if podman0: Received a netlink error message Operation not supported (os error 95)\""
Sep 29 21:41:21 ip-10-0-13-150 podman[12072]: 2022-09-29 21:41:21.689689685 +0000 UTC m=+0.185177533 container remove 214a91497691b33a2ee77a0ad6dc3b3894e102abf84d641f5df2abfc842d1cd0 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 21:41:21 ip-10-0-13-150 podman[12002]: Error: OCI runtime error: crun: sd-bus call: Transaction for libpod-214a91497691b33a2ee77a0ad6dc3b3894e102abf84d641f5df2abfc842d1cd0.scope/start is destructive (shutdown.target has 'start' job queued, but 'stop' is included in transaction).: Resource deadlock avoided
Sep 29 21:41:21 ip-10-0-13-150 systemd[1]: clean.service: Control process exited, code=exited, status=126/n/a
Sep 29 21:41:21 ip-10-0-13-150 systemd[1]: clean.service: Failed with result 'exit-code'.
Sep 29 21:41:21 ip-10-0-13-150 systemd[1]: Stopped clean.service - Clean on shutdown.
-- Boot 332c28360e38479c91f5cab4898413b4 --
Sep 29 21:41:45 ip-10-0-13-150 systemd[1]: Starting clean.service - Clean on shutdown...
Sep 29 21:41:45 ip-10-0-13-150 systemd[1]: Finished clean.service - Clean on shutdown.

I’ve made this mistake myself and seen users struggle with it without recognizing it. With today’s daemonless container runners, it’s difficult to define ordering dependencies to ensure a container can reliably start as part of shutdown. This is an open problem area that depends on the container runner.

ExecStop an existing container

Let’s be more clever. If we can’t easily start a container during shutdown, let’s have the container running, but awaiting a signal. Create a new script called await to reflect the new approach. Run this script directly to verify it waits, then prints messages when you Ctrl-C.

#!/bin/bash

cleanup() {
  echo "cleaning..."
  for i in {1..3}; do
    sleep 5s
    echo "waiting..."
  done

  echo "done"
}

trap cleanup SIGINT SIGTERM

echo "Awaiting signals"
sleep infinity & wait $!

We take care that the await bash script will run as process 1 in the container and handle signals. Non-builtin commands (like sleep) can defer signals in non-interactive contexts, so it’s critical we background sleep and use the builtin wait (interruptible) to wait for the previous command.

Create, enable, and start a new clean.service systemd unit.

# /etc/systemd/system/clean.service
[Unit]
Description=Clean on shutdown
After=multi-user.target

[Service]
Type=simple
ExecStartPre=-/usr/bin/podman rm clean
ExecStart=/usr/bin/podman run \
  --name clean \
  --log-driver=k8s-file \
  --rm \
  -v /usr/local/bin:/scripts \
  --stop-timeout=60 \
  --entrypoint /scripts/await \
  docker.io/fedora:36
ExecStop=/usr/bin/podman stop clean
TimeoutStopSec=180

[Install]
WantedBy=multi-user.target

This unit is a typical long-running service with Type=simple. Podman starts a container and proxies signals (like SIGTERM on shutdown) to the container’s first process. When the unit stops, podman stop sends a SIGTERM to the container’s first process too (and a SIGKILL after --stop-timeout, which defaults to 10s).

As before, the unit is pulled in by multi-user.target, but ordered to start After multi-user.target, fairly late in startup. Ordered units are stopped in reverse start order, so clean should begin stopping before other units. The service TimeoutStopSec determines how long systemd should wait for ExecStop before SIGKILL.

Confirm that the unit is active and the container is running, awaiting a signal.

systemctl status clean.service
● clean.service - Clean on shutdown
     Loaded: loaded (/etc/systemd/system/clean.service; enabled; vendor preset: disabled)
     Active: active (running) since Fri 2022-10-21 16:34:41 UTC; 1min 12s ago
   Main PID: 239609 (podman)
      Tasks: 9 (limit: 4427)
     Memory: 18.6M
        CPU: 275ms
     CGroup: /system.slice/clean.service
             ├─ 239609 /usr/bin/podman run --name clean --log-driver=k8s-file --rm -v /usr/local/bin:/scripts --stop-timeout=60 --entrypoint /scripts/await docker.io/fedora:36
             └─ 239680 /usr/bin/conmon --api-version 1 -c 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 -u 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70/userdata -p /run/containers/storage/overlay-containers/21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70/userdata/pidfile -n clean --exit-dir /run/libpod/exits --full-attach -s -l k8s-file:/var/lib/containers/storage/overlay-containers/21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70/userdata/ctr.log --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/containers/storage/overlay-containers/21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70/userdata/oci-log --conmon-pidfile /run/containers/storage/overlay-containers/21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /run/containers/storage --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/libpod --exit-command-arg --network-config-dir --exit-command-arg "" --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /var/lib/containers/storage/volumes --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70

Sep 29 16:34:41 ip-10-0-0-27 systemd[1]: Started clean.service - Clean on shutdown.
Sep 29 16:34:42 ip-10-0-0-27 podman[239609]: 2022-10-21 16:34:42.02827346 +0000 UTC m=+0.073162795 container create 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 16:34:42 ip-10-0-0-27 podman[239609]: 2022-10-21 16:34:41.999309821 +0000 UTC m=+0.044199164 image pull  docker.io/fedora:36
Sep 29 16:34:42 ip-10-0-0-27 podman[239609]: 2022-10-21 16:34:42.253726035 +0000 UTC m=+0.298615370 container init 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 16:34:42 ip-10-0-0-27 podman[239609]: 2022-10-21 16:34:42.263454996 +0000 UTC m=+0.308344331 container start 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 16:34:42 ip-10-0-0-27 podman[239609]: 2022-10-21 16:34:42.263872042 +0000 UTC m=+0.308761624 container attach 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 16:34:42 ip-10-0-0-27 podman[239609]: Awaiting signals

Note the boot id and run systemctl reboot (annotated 🔁). After reboot, check the logs.

Sep 29 16:34:42 ip-10-0-0-27 podman[239609]: Awaiting signals
🔁
Sep 29 16:42:06 ip-10-0-0-27 podman[239609]: cleaning...
Sep 29 16:42:06 ip-10-0-0-27 podman[239609]: Terminated
Sep 29 16:42:06 ip-10-0-0-27 systemd[1]: Stopping clean.service - Clean on shutdown...
Sep 29 16:42:11 ip-10-0-0-27 podman[239609]: cleaning...
Sep 29 16:42:16 ip-10-0-0-27 podman[239609]: waiting...
Sep 29 16:42:21 ip-10-0-0-27 podman[239609]: waiting...
Sep 29 16:42:26 ip-10-0-0-27 podman[239609]: waiting...
Sep 29 16:42:26 ip-10-0-0-27 podman[239609]: done
Sep 29 16:42:26 ip-10-0-0-27 podman[239609]: waiting...
Sep 29 16:42:31 ip-10-0-0-27 podman[239609]: waiting...
Sep 29 16:42:36 ip-10-0-0-27 podman[239609]: waiting...
Sep 29 16:42:36 ip-10-0-0-27 podman[239609]: done
Sep 29 16:42:36 ip-10-0-0-27 podman[239609]: 2022-10-21 16:42:36.44559863 +0000 UTC m=+474.490487974 container died 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 (image=docker.io/library/fedora:36, name=clean, health_status=)
Sep 29 16:42:36 ip-10-0-0-27 podman[241961]: 2022-10-21 16:42:36.80223871 +0000 UTC m=+30.342432111 container cleanup 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 16:42:36 ip-10-0-0-27 podman[241961]: clean
Sep 29 16:42:36 ip-10-0-0-27 podman[239609]: 2022-10-21 16:42:36.907856847 +0000 UTC m=+474.952746174 container remove 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 16:42:36 ip-10-0-0-27 podman[239609]: time="2022-10-21T16:42:36Z" level=error msg="forwarding signal 15 to container 21009bcf34c2fbd3810f7301d1693dba084d9d098cdcbd9e3b15d5dc22e3bd70: container has already been removed"
Sep 29 16:42:36 ip-10-0-0-27 systemd[1]: clean.service: Deactivated successfully.
Sep 29 16:42:36 ip-10-0-0-27 systemd[1]: Stopped clean.service - Clean on shutdown.
-- Boot d0b5eba20ebd452aae5dbffaee19eff8 --

The cleanup task runs and delays shutdown, but it runs twice. The running container process receives the SIGTERM and podman stop also sends a SIGTERM. Reproduce this by running the await script directly. Hit Ctrl-C multiple times and see cleanup invoked multiple times.

There are a few ways to address this. One is to remove the podman stop call in ExecStop - a drawback is that normal systemctl operations like systemctl stop clean.service won’t work and we need to SIGTERM the container ourselves when debugging.

Instead, let’s clear/reset the trap. It’s icky to handle signals and concurrency in shell scripts and would be much nicer with Go, but it’ll suffice for a blog post.

cleanup() {
  trap - SIGINT SIGTERM
  echo "cleaning..."
  ...
}
...

Reload and restart clean.service, note the boot id, and try rebooting again (annotated 🔁).

Sep 29 17:01:04 ip-10-0-0-27 podman[3688]: Awaiting signals
🔁
Sep 29 17:02:20 ip-10-0-0-27 podman[3688]: cleaning...
Sep 29 17:02:20 ip-10-0-0-27 podman[3688]: Terminated
Sep 29 17:02:20 ip-10-0-0-27 systemd[1]: Stopping clean.service - Clean on shutdown...
Sep 29 17:02:25 ip-10-0-0-27 podman[3688]: waiting...
Sep 29 17:02:30 ip-10-0-0-27 podman[3688]: waiting...
Sep 29 17:02:35 ip-10-0-0-27 podman[3688]: waiting...
Sep 29 17:02:35 ip-10-0-0-27 podman[3688]: done
Sep 29 17:02:35 ip-10-0-0-27 podman[3688]: 2022-10-21 17:02:35.391304416 +0000 UTC m=+91.032365358 container died 75079c6a180f82d398bb33e1e22a45bf4b281e84912e780842dd167658e6179f (image=docker.io/library/fedora:36, name=clean, health_status=)
Sep 29 17:02:35 ip-10-0-0-27 podman[4229]: 2022-10-21 17:02:35.740391215 +0000 UTC m=+0.325653570 container remove 75079c6a180f82d398bb33e1e22a45bf4b281e84912e780842dd167658e6179f (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 17:02:35 ip-10-0-0-27 podman[4090]: clean
Sep 29 17:02:35 ip-10-0-0-27 systemd[1]: clean.service: Main process exited, code=exited, status=143/n/a
Sep 29 17:02:35 ip-10-0-0-27 systemd[1]: clean.service: Failed with result 'exit-code'.
Sep 29 17:02:35 ip-10-0-0-27 systemd[1]: Stopped clean.service - Clean on shutdown.
-- Boot f8b835d07c7f497c83487bdf3bd3e319 --

Verify cleanup only ran once (3 waiting... messages) and that it delayed shutdown.

There’s one more fix. Exit code 143 indicates the container exited due to a SIGTERM graceful shutdown request. systemd marks this as an error, but we’d call it a success. Set the service’s SuccessExitStatus.

SuccessExitStatus=143
Sep 29 17:09:56 ip-10-0-0-27 podman[4981]: Awaiting signals
🔁
Sep 29 17:16:03 ip-10-0-0-27 podman[4981]: cleaning...
Sep 29 17:16:03 ip-10-0-0-27 podman[4981]: Terminated
Sep 29 17:16:03 ip-10-0-0-27 systemd[1]: Stopping clean.service - Clean on shutdown...
Sep 29 17:16:08 ip-10-0-0-27 podman[4981]: waiting...
Sep 29 17:16:13 ip-10-0-0-27 podman[4981]: waiting...
Sep 29 17:16:18 ip-10-0-0-27 podman[4981]: waiting...
Sep 29 17:16:18 ip-10-0-0-27 podman[4981]: done
Sep 29 17:16:18 ip-10-0-0-27 podman[4981]: 2022-10-21 17:16:18.636792684 +0000 UTC m=+382.160266981 container died 91888ef150ea54a9831736a966302d1811028ce3216afb4368bf0a266e61ee52 (image=docker.io/library/fedora:36, name=clean, health_status=)
Sep 29 17:16:18 ip-10-0-0-27 podman[7028]: 2022-10-21 17:16:18.968155309 +0000 UTC m=+15.157225407 container cleanup 91888ef150ea54a9831736a966302d1811028ce3216afb4368bf0a266e61ee52 (image=docker.io/library/fedora:36, name=clean, health_status=, maintainer=Clement Verna <cverna@fedoraproject.org>)
Sep 29 17:16:18 ip-10-0-0-27 podman[7028]: clean
Sep 29 17:16:19 ip-10-0-0-27 systemd[1]: clean.service: Deactivated successfully.
Sep 29 17:16:19 ip-10-0-0-27 systemd[1]: Stopped clean.service - Clean on shutdown.
-- Boot 3517a0a13b3e4885be9901666c0fd173 --

Next Steps

We’ve seen that systemd unit shutdown can be subtle. And we’ve only focused on early shutdown - as opposed to services (or actions) that should stop as late as possble (e.g. loggers). Still, we’ve looked at several strategies.

In the next post, we’ll apply these strategies to the Kubernetes Kubelet. The Kubelet service registers itself with a Kubernetes cluster and starts containers via a container runtime, with features like preStop hooks, terminationGracePeriod, and disruption budgets. But stopping the Kubelet service doesn’t inform the cluster or stop containers and (until recently) neither would shutdown.

Want to see more content like this?

Source

Examples from this post are available in blog-bits under the MPL 2.0 license. They were tested on Fedora CoreOS 36.20221001.3.0 (systemd 250.8).

[Unit]
Description=Clean on shutdown
After=multi-user.target

[Service]
Type=simple
ExecStartPre=-/usr/bin/podman rm clean
ExecStart=/usr/bin/podman run \
  --name clean \
  --log-driver=k8s-file \
  --rm \
  -v /usr/local/bin:/scripts \
  --stop-timeout=60 \
  --entrypoint /scripts/await \
  docker.io/fedora:36
ExecStop=/usr/bin/podman stop clean
TimeoutStopSec=180
SuccessExitStatus=143

[Install]
WantedBy=multi-user.target
#!/bin/bash

cleanup() {
  trap - SIGINT SIGTERM
  echo "cleaning..."
  for i in {1..3}; do
    sleep 5s
    echo "waiting..."
  done

  echo "done"
}

trap cleanup SIGINT SIGTERM

echo "Awaiting signals"
sleep infinity & wait $!

If you find a bug, please send a fix and I’ll try to get it updated.

Resources