Systemd in Production: Units, timers and sandboxing

Introduction

Systemd, hate it or love it. During my time as a professional Linux Engineer, I hear a lot of opinions about Systemd. I must say: most of them are neutral, but I’ve heard either good and bad stuff about it as well. Mostly senior engineers, people who’ve been in the field for long enough, often criticize that Linux Distributions never should’ve stepped away from the ‘good ol init-system’. If you read about the UNIX-philosophy: they might be right. The philosophy states that a program should only do one thing it is supposed to do. For example: grep is used for finding strings in text, so the developers of grep shouldn’t all the sudden give it a feature to write text to file. We already have sed. That’s why we have those long one-liners, everyone is so impressed by.

I’ve been running Systemd in my homelab and professionaly for a couple of years now. After fully understanding Systemd, I’ve found out it’s so much more than just an Init System. It is packed with features like: journaling, services, units, mounts, booting, timers, and more. This is exactly why the older generation of System Administrators / Engineers dislike Systemd: it does multiple things and also makes multiple things obsolete.

In this write up, I’ll primarily talk about using Systemd in production. Where it might be a good idea to use, and where not. I also want you, the reader, to fully understand why it is generally a good idea to use it.

Audience and scope

Who this is for

Linux and DevOps engineers
People who already use systemd, but don’t fully trust it yet.

What this is not

A beginner tutorial
A replacement for man systemd.service

Understanding Init-Systems

Before we can discuss why Systemd is more than an Init-system, we first have to understand what an Init-system is. Before Systemd, the majority of Linux distributions operated under System V(-init), or SysV in short. Like the name ‘init’ states, it ‘initializes’ your system: with other words booting up.

After the initial boot sequence of your server, the Operating System has to start up. This requires many services and scripts to start up. For example: mounting disks, setting up network connections (multiple layers), managing user sessions, starting a visual desktop environment (if you have one). And also: make sure your webserver/database/print/file/… services are started. Without an Init-system, nothing starts and therefore: nothing works.

SystemV / SysV-init

SysV-init has a couple of flaws. These range from boot times, to service supervision. But also logging and resource management and isolation. I’ll sum them briefly below.

1. Strictly sequential startup

The start sequence of SysV-init is sequential. Meaning: there is a directory full of start-up scripts, which start after eachother. The order of each script, is set by (mis-)using the filename. For example: 10-network(.sh), 20-apache, 30-mysql, … . In this example MySQL always has to wait before Apache is started. Apache can only be started after the network is started. If Apache was previously misconfigured, this could mean that the server would hang on Apache. MySQL and other services that would’ve been started later otherwise, would never start.

Sys-V init also has no concept of why a service must start after another one. In the example above, it might be clear that Apache can only be started after the network is ready. But why does MySQL have to wait for Apache? A massive downside of the SysV-Init method is parallelization. This is basically impossible without fragile hacks and workarounds.

The use of strictly sequential startup resulted in slow boot times, weird bugs (e.g. service starts before netwok is actually ready), and difficult to reason about correctness. The last one in particular because there weren’t any official guidelines.

2. No real service supervision

The second flaw of SysV-init is about service supervision. Starting and stopping services isn’t the big deal here. It is about supervision and monitoring if the processes are still running. The way SysV-init starts a process is by starting it, after that init is done. It has started the process, but doesn’t care about the current state. If the daemon crashes later, init doesn’t know.

Therefore engineers have to add a lot of ad-hoc restart logic to the processes, but also the server it self. Think about so called watchdog-scripts, triggered by crons. These scripts just check if the main process is running, if it’s not: it will perform a action. Because Init didn’t handle the supervision of processes, things would often go wrong. A popular example of this, is stale PID files.

The lack of supervision results in rogue processes, a lot of overhead and stale PID files. These are one of the many reasons why thirdparty developers developed process manager(s), like Supervisor. Supervisor is a process manager and is able to restart it’s subprocesses whenever they crash or get killed.

3. Shell scripts as the definition language

The third flaw of this system I want to talk about, is using shell scripts as the definition language. Init scripts are just arbitrary shell scripts. The behaviour of these shell scripts heavily depend on the shell version, environment variables and also path assumptions. In order to know how a service is ran, you would’ve go through the whole shell script in order to get a gasp of how it works.

Using shell scripts as the definition language also introduces some other flaws, like: hard to analyze services, error handling, logging, unpredictable behaviour* and unstructured configuration*.

*: Again; there are no real official guidelines on how to design these init-files. Every new init file you encounter, is a new surprise.

4. No built-in logging

One of the biggest problems with SysV-init, is that services log wherever they want. By default, Linux states that logging should be in the /var/log directory. But there is no default of how to shape it there. Therefore you often have multiple ‘flavours’ of this: /var/log/myapp/error.log, /var/log/myapp_error.logor by using syslog: /var/log/syslog. Other processes might just decide to redirect stdout to /dev/null, or log it to REALLY weird places. Looking at you, SOLR: /opt/solr/server/logs

This is really inconsistent. Whenever ‘shit hits the fan’, you’ll have to find the needed log files first.

5. No integrated resource management or isolation

SysV-init has no integrated resourrce management or isolation. What I mean by that, is that it doesn’t have a way to control CPU, Memory and I/O for each user, or even service. Therefore services can DoS the system.

I always like to take Webdevelopment / Webservices as a example, especially since that is my field of work. Imagine having a shared hosting server, with 200 websites, across 100 users. Some of these websites are static pages, meaning they don’t really need any CPU cycles, I/O and aren’t memory hogging the system. Some of those websites are dynamically built websites, like Webshops. For each action you do on a Webshop, the server has to calculate things for you. For example: a standard webshop starts a session for each visitor, show the items that are (un)available, keeps track of expenses, keeps tracks of the items you’ve put in your cart, talks to the payment provider, and a lot more.

Each of these actions take a lot of CPU cycles or memory, because it has to continuously calculate stuff. If there is a lot of traffic, the server has to do this for multiple users simultaneously. If you have multiple webshops on the server, this takes up a lot of compute space. Eventually other sites on your server become unavailable, because a webshop with dog toys is consuming all the resources.

In an ideal world, you’ll restrict these webshops in CPU cycles, memory and I/O. By restricting all the users in these categories, you’ll make sure that no site is burning the server down. A popular example of this is: CloudLinux

Why Systemd is More Than an Init System

Now that we’ve established what an Init system actually does, and what the shortcomings of SysV-init are, let’s talk about Systemd and why it is so much more than “just” a replacement for SysV-init.

Systemd was designed to solve the exact problems I described in the previous section. But the developers didn’t stop there. They built an entire ecosystem around it. When people say “Systemd does too much”, they’re not wrong - it really does a lot. But that’s also exactly the point.

Let’s break down what Systemd actually consists of.

PID 1

When your Linux system boots, the kernel needs something to hand control over to. That something becomes PID 1 - the very first process on your system, and the parent of every other process. On a Systemd system, systemd itself is PID 1. This matters more than people realise.

Because Systemd is PID 1, it owns the entire process tree. It knows what’s running, what died, and what should be restarted. It doesn’t just fire off a process and forget about it. It maintains a relationship with every service it manages. This is fundamentally different from how SysV-init operated.

The Subsystems

Systemd ships with a collection of tightly integrated components. You’re probably using most of them without realising it:

systemd-journald - the logging daemon. All service output goes here by default. Structured, indexed, queryable. More on this later.
systemd-networkd - network configuration management. A lightweight alternative to NetworkManager, very popular on servers.
systemd-resolved - DNS resolution and caching. Handles split-horizon DNS, DNSSEC, and more.
systemd-logind - user session management. Handles seats, power buttons, lid switches.
systemd-tmpfiles - manages temporary files and directories. Cleans up /tmp on boot, creates necessary directories for services.
systemd-udevd - device event management. The thing that detects your USB stick and gives it a predictable name.
systemd-timedatectl - time and timezone management.

None of these are strictly required, and not all distributions enable all of them. But they’re all there, and they all integrate with each other.

cgroups Integration

This is where it gets interesting. Every service that Systemd manages gets its own cgroup (control group). A cgroup is a Linux kernel feature that lets you group processes together and apply resource limits to the whole group.

What this means in practice: if your web application spawns 40 worker processes, Systemd knows about all of them. It can limit their combined CPU usage, their total memory consumption, and their I/O bandwidth. And when you stop the service, it kills all 40 of them. No orphan processes, no stale PID files.

This is a direct solution to two of the biggest SysV-init problems I described earlier: no service supervision, and no resource isolation.

Socket Activation

Systemd has a concept called socket activation. The idea is that Systemd creates and holds a socket open, and only starts the actual service when something connects to it. The service then inherits the socket from Systemd.

This solves a subtle but real problem: startup ordering. If service A depends on service B’s socket, Systemd can make the socket available immediately, without B being fully started yet. A can connect, the connection gets buffered, and by the time B is actually running, it processes the buffered request. No timing issues, no “wait for it to be ready” hacks.

It also means services start on-demand rather than eagerly. On a desktop, this makes boot noticeably faster. On a server, it reduces idle resource usage.

The Bottom Line

So when someone says “Systemd is just an init system”, you can correct them. It’s a process supervisor, a log aggregator, a resource controller, a device manager, a network daemon, a DNS resolver, a session manager, and a timer scheduler… All integrated, all talking to each other, all manageable through a single unified interface.

Is that a violation of the UNIX philosophy? Arguably yes. Does it solve real problems that the old approach never did? Absolutely.

Production Grade Unit Files - A Design Overview

Unit files are the way you tell Systemd about a service. At first glance, they look simple. And they are - until you start running things in production and find out that your naïve unit file doesn’t handle crashes well, starts before the network is ready, runs as root, and dumps logs into the void.

Let me walk you through building a production-grade unit file from scratch.

The Unit File Anatomy

A unit file has three sections: [Unit], [Service], and [Install]. Each has a specific purpose:

[Unit] - metadata and dependencies. What is this, and what does it need?
[Service] - how to actually run the thing. What binary, as what user, with what restart behaviour?
[Install] - when should this be enabled? What target pulls this in?

Step 1 - What Most People Start With

This is the unit file you find on Stack Overflow:

[Unit]
Description=My App

[Service]
ExecStart=/usr/bin/myapp

[Install]
WantedBy=multi-user.target

It’s simple, it works, it starts the process. but… it does nothing else. Let’s fix that.

Step 2 - Choosing the Right Service Type

The Type= directive tells Systemd how to determine whether your service has successfully started. Getting this wrong leads to dependency ordering issues and misleading status output.

Type=simple    # Default. "Started" the moment the process is exec'd. Doesn't mean it's ready.
Type=exec      # Better default. Waits until the binary is actually running (past fork/exec).
Type=forking   # For old-style daemons that fork into the background. Avoid in new software.
Type=notify    # The service calls sd_notify() when it's ready. Most reliable option.
Type=oneshot   # For scripts and jobs that run once and exit. Use with RemainAfterExit=yes if needed.

For most modern applications: use Type=exec. If you control the application code and can add sd_notify() support, use Type=notify. Systemd will know the exact moment your service is ready to accept connections.

Step 3 - Restart Behaviour

SysV-init had no restart logic. You had to build it yourself. With Systemd, you get it for free, but the defaults need tuning.

Restart=on-failure          # Restart only when the process exits non-zero or is killed by a signal
Restart=always              # Restart no matter what - including after a clean `systemctl stop`. Be careful.
RestartSec=5s               # Wait 5 seconds before attempting restart
StartLimitIntervalSec=60s   # Window for counting restart attempts
StartLimitBurst=3           # Maximum 3 restart attempts within the window, then give up

The StartLimitBurst and StartLimitIntervalSec combination is important. Without it, a broken service can restart indefinitely in a tight loop and eat your CPU. With it, Systemd gives up after a few attempts and marks the service as failed - which means your monitoring will catch it.

Step 4 - Don’t Run as Root

This one should be obvious, but you’d be surprised how often it’s skipped. Create a dedicated user for your service:

useradd --system --no-create-home --shell /usr/sbin/nologin myapp

Then reference it in the unit file:

User=myapp
Group=myapp
WorkingDirectory=/opt/myapp

Your service doesn’t need root. It needs access to specific files and ports. Grant exactly that, nothing more.

Step 5 - Environment and Secrets

Never hardcode secrets in a unit file. Unit files are often world-readable under /etc/systemd/system/. Use an environment file instead:

EnvironmentFile=/etc/myapp/env
Environment=LOG_LEVEL=info

The environment file (/etc/myapp/env) should have restrictive permissions: chmod 640, owned by root:myapp. It’s a simple key=value format:

DATABASE_URL=postgres://user:password@localhost/mydb
SECRET_KEY=your-secret-here

Step 6 - Logging

With Systemd, you don’t need to configure log files in your application. Just write to stdout and stderr, and journald picks it up automatically:

StandardOutput=journal
StandardError=journal
SyslogIdentifier=myapp

The SyslogIdentifier sets the identifier used in journal queries. Without it, the identifier defaults to the binary name - which can be confusing if you have multiple instances or wrapper scripts. Set it explicitly, and querying logs becomes predictable:

journalctl -u myapp -f

Step 7 - Lifecycle Hooks

Systemd supports additional exec directives for different points in the service lifecycle:

ExecStartPre=/usr/bin/myapp --check-config    # Run before start. If this fails, the service won't start.
ExecStartPost=/usr/bin/notify-monitoring.sh   # Run after successful start.
ExecReload=/bin/kill -HUP $MAINPID            # How to reload config without a full restart.
ExecStop=/usr/bin/myapp --graceful-stop        # Custom stop command instead of a plain kill.

ExecStartPre is particularly useful for config validation. If your app has a --check-config flag, run it here. A misconfiguration will abort the start before anything breaks.

The Full Unit File

Putting it all together:

[Unit]
Description=My Production Application
Documentation=https://docs.example.com/myapp
After=network-online.target postgresql.service
Requires=postgresql.service

[Service]
Type=exec
User=myapp
Group=myapp
WorkingDirectory=/opt/myapp
EnvironmentFile=/etc/myapp/env
Environment=LOG_LEVEL=info

ExecStartPre=/usr/bin/myapp --check-config
ExecStart=/usr/bin/myapp --config /etc/myapp/config.yaml
ExecReload=/bin/kill -HUP $MAINPID

Restart=on-failure
RestartSec=5s
StartLimitIntervalSec=60s
StartLimitBurst=3

StandardOutput=journal
StandardError=journal
SyslogIdentifier=myapp

[Install]
WantedBy=multi-user.target

This is still relatively minimal. We haven’t touched sandboxing or resource limits yet - those get their own sections. But this unit file already handles: ordering, restarts, user isolation, config validation, secrets management, and structured logging. That’s a long way from the Stack Overflow version.

Dependency Management - Without Shooting Yourself

Dependency management in Systemd is one of those things that looks simple until it bites you. The directives are easy to read, but they’re easy to misread. Let me clear up the common confusion.

The Two Dimensions

Systemd dependency management has two completely separate dimensions: ordering and requirement. They are independent, and conflating them is the source of most dependency bugs.

Ordering - does A start before or after B?
Requirement - if B fails, should A also fail?

This maps to specific directives:

Directive	Dimension	Meaning
`After=`	Ordering	Start me after X has started
`Before=`	Ordering	Start me before X starts
`Wants=`	Requirement (weak)	I want X running, but I’ll continue if it fails
`Requires=`	Requirement (strong)	If X fails or stops, stop me too
`BindsTo=`	Requirement (strict)	If X deactivates for any reason, deactivate immediately
`PartOf=`	Requirement (partial)	If X is stopped or restarted, I follow - but not the other way around

The Most Common Mistake

This is something I’ve seen in production more than once:

# This looks right, but it isn't.
Requires=postgresql.service

Requires= is a requirement directive. It is not an ordering directive. This unit file tells Systemd “if PostgreSQL stops, stop me too.” It does not tell Systemd “start PostgreSQL before starting me.” Your application can still start before PostgreSQL is ready.

The correct version:

Requires=postgresql.service
After=postgresql.service

You almost always want both together. Requires= without After= is a trap.

network.target vs network-online.target

This one causes subtle, hard-to-reproduce bugs:

# network.target - the network interfaces are UP. IP is assigned. That's it.
After=network.target

# network-online.target - the network is actually reachable. DNS works. Routes are set.
After=network-online.target

For services that need to connect to an external database, an API, or anything over the network: always use network-online.target. network.target is reached very early in boot, sometimes before your routes are even configured.

Note that network-online.target can slow down boot, because something has to declare the network as “online.” On servers with systemd-networkd, this is handled automatically. If you’re seeing slow boot times and you’ve added network-online.target everywhere, audit whether each service actually needs it.

Soft vs Hard Dependencies

Not every dependency needs to be a hard requirement. Sometimes you want to express “start X if it’s available, but don’t fail if it isn’t”:

Wants=redis.service
After=redis.service

This is useful for optional components: a caching layer, a metrics exporter, or a sidecar service. If Redis isn’t installed, your application starts anyway. If it is installed, Systemd will try to start it first.

Circular Dependencies

Circular dependencies (A requires B, B requires A) will prevent both services from starting. Systemd will log an error and refuse to start either unit.

Catch them before they hit production:

systemd-analyze verify /etc/systemd/system/myapp.service

This validates your unit file and checks for dependency problems. Run it as part of your deployment pipeline, not just locally.

Visualising the Dependency Tree

When something doesn’t start in the right order, it helps to see the full picture:

systemctl list-dependencies myapp.service           # What does myapp need?
systemctl list-dependencies myapp.service --reverse  # What depends on myapp?
systemd-analyze critical-chain myapp.service         # The longest dependency chain (why boot is slow)

These three commands have saved me a lot of guesswork.

Timers - And How They Differ From Crons

Cron has been around since 1975. It works. But it has a set of limitations that become painful once you’re running things at scale or with real reliability requirements. Systemd timers don’t replace cron in every situation, but for services already managed by Systemd, they’re almost always the better choice.

What’s Wrong With Cron

Cron’s limitations are mostly invisible until something goes wrong:

No logging by default. A cron job either sends mail (which nobody reads on a server) or disappears into silence. You run a backup job every night and have no idea whether it succeeded, failed, or silently did nothing.

No dependency management. Cron has no concept of “only run this job if the database is available” or “don’t run this if that other job is still going.”

Missed jobs disappear. If your server is down during a scheduled job, that job simply doesn’t run. No retry, no record of the miss, nothing.

No resource limits. A runaway cron job can take your server down just like any other process.

Race conditions. Nothing stops two instances of the same job from running simultaneously. For a backup script that locks a database table, this can be catastrophic.

The Timer/Service Pair

Systemd timers always come in pairs: a .timer unit and a corresponding .service unit. The timer defines when to run; the service defines what to run.

# /etc/systemd/system/db-backup.timer
[Unit]
Description=Nightly database backup

[Timer]
OnCalendar=daily
RandomizedDelaySec=900
Persistent=true

[Install]
WantedBy=timers.target

# /etc/systemd/system/db-backup.service
[Unit]
Description=Database backup job
After=postgresql.service

[Service]
Type=oneshot
User=backup
ExecStart=/usr/local/bin/db-backup.sh
StandardOutput=journal
StandardError=journal
SyslogIdentifier=db-backup

A few things worth highlighting:

Persistent=true means that if the timer was supposed to fire while the server was off, it will fire as soon as the server comes back up. This is the fix for the “missed jobs disappear” problem.

RandomizedDelaySec=900 adds a random delay of up to 15 minutes before the job runs. This sounds counterintuitive, but it’s extremely useful if you have multiple servers that boot at the same time - without this, they all hammer the database simultaneously.

Type=oneshot is the correct type for jobs that run and exit. The service goes from activating to inactive when done, which is the expected behaviour.

The OnCalendar Syntax

This is more expressive than cron notation, and importantly, it’s human-readable:

OnCalendar=daily                  # Every day at midnight
OnCalendar=weekly                 # Monday at midnight
OnCalendar=Mon..Fri 09:00         # Weekdays at 9am
OnCalendar=*-*-* 03:30:00         # Every day at 03:30 (explicit notation)
OnCalendar=*:0/15                 # Every 15 minutes
OnCalendar=Mon *-*-* 04:00:00     # Every Monday at 4am (same as 'weekly' but explicit)

You can validate any calendar expression before deploying it:

systemd-analyze calendar "Mon..Fri 09:00"

This outputs the next scheduled fire times, so you know exactly what you’re getting before the job actually runs.

Testing and Monitoring

Unlike cron jobs, you can trigger a timer’s service unit manually without waiting for the schedule:

systemctl start db-backup.service    # Run it right now
journalctl -u db-backup.service      # Check the output
systemctl list-timers                # See all timers and their next trigger time

systemctl list-timers is something I run often. It shows every timer on the system, when it last fired, and when it will fire next. There’s no equivalent in cron.

When to Keep Using Cron

Timers aren’t free. Each one requires two unit files, and they only make full sense in environments where the service is already managed by Systemd. In a few situations, cron is still the right call:

The job is completely self-contained, has no dependencies, and runs on systems where you don’t manage other Systemd units.
You’re maintaining a script that other people also run on non-Systemd systems (BSD, macOS) and you want a single scheduling mechanism.
The system explicitly doesn’t use Systemd (containers, some embedded systems).

For everything else - especially in production - timers give you better logging, better reliability, and better integration with the rest of your service management.

Sandboxing and Security

One of the most underused features of Systemd is its built-in sandboxing. Most people are aware that you can run a service as a non-root user. Fewer know that you can also restrict which parts of the filesystem it can see, which system calls it can make, and whether it can access the network at all. All without containers, all declared in the unit file.

Let me walk through this progressively. You don’t have to enable everything at once.

Step 1 - Filesystem Restrictions

By default, a service can read from (and potentially write to) anywhere on the filesystem that its user permissions allow. That’s often much more than it actually needs.

ProtectSystem=strict       # Makes /usr, /boot, and /efi read-only
ProtectHome=true           # Makes /home, /root, and /run/user invisible
PrivateTmp=true            # Gives the service its own /tmp, isolated from the rest of the system
ReadWritePaths=/var/lib/myapp   # The only path where this service is allowed to write

PrivateTmp in particular is worth enabling on almost every service. It means that two services cannot communicate through /tmp accidentally (or maliciously), and a cleanup of /tmp on service stop is automatic.

Step 2 - Capabilities

Linux capabilities are a way to give a process a subset of what root can do, without giving it full root access. Binding to port 80 requires the CAP_NET_BIND_SERVICE capability. Changing file ownership requires CAP_CHOWN. And so on.

If your service doesn’t need any capabilities (which is true for most application-level services):

CapabilityBoundingSet=     # Empty = no capabilities at all
NoNewPrivileges=true       # Cannot gain new privileges through setuid binaries or similar

NoNewPrivileges=true is cheap to enable and should be on by default for any service you write from scratch. It prevents privilege escalation even if the process is somehow exploited.

Step 3 - System Call Filtering

System calls are the interface between user-space processes and the Linux kernel. Most services only need a small subset of all available system calls. Systemd lets you whitelist them:

SystemCallFilter=@system-service    # A curated set of syscalls appropriate for daemons
SystemCallArchitectures=native      # Only allow syscalls for the system's native architecture

Systemd ships with predefined syscall groups (prefixed with @) that cover common use cases: @system-service, @network-io, @file-system, and more. You can see the full list with systemd-analyze syscall-filter.

Syscall filtering is probably the most powerful security feature here, and also the most likely to break things if you’re too aggressive. Start with @system-service and test thoroughly.

Step 4 - Network Isolation

For services that don’t need network access at all (batch jobs, file processors, local utilities):

PrivateNetwork=true    # Gives the service a private, empty network namespace. No connectivity.

For services that need network access but should be restricted:

RestrictAddressFamilies=AF_INET AF_INET6    # Only IPv4 and IPv6, no Unix domain sockets etc.
IPAddressDeny=any                           # Block all outbound connections by default
IPAddressAllow=10.0.0.0/8                   # Only allow connections to internal network

Step 5 - Miscellaneous Hardening

A few more options that are generally safe to enable:

LockPersonality=true          # Prevents changing the execution domain (ABI) of the process
RestrictRealtime=true         # Prevents acquiring real-time scheduling priority
RestrictSUIDSGID=true         # Prevents setting SUID/SGID bits on files
MemoryDenyWriteExecute=true   # Prevents creating memory mappings that are both writable and executable

MemoryDenyWriteExecute=true is worth calling out specifically: it prevents a common class of exploit where an attacker writes shellcode into memory and then executes it. However, it will break runtimes that use JIT compilation (Java, Node.js, some Python extensions). Test before enabling.

Checking Your Security Score

Systemd has a built-in tool that analyses the security posture of a unit file and gives it a score:

systemd-analyze security myapp.service

The output looks something like this:

NAME                                                        DESCRIPTION                                                             EXPOSURE
✗ PrivateNetwork=                                           Service has access to the host network                                        0.5
✗ User=/DynamicUser=                                        Service runs as root user                                                     0.4
✓ NoNewPrivileges=                                          Service cannot acquire new privileges                                          ...

The score runs from 0.0 (maximally hardened) to 10.0 (completely open). It also gives you concrete suggestions for what to enable next. I run this on every service before it goes into production, and I aim to get the score below 4.0 for anything internet-facing.

Resource Control and Stability

The cgroups integration I mentioned earlier isn’t just for tracking processes - it’s also what makes resource control possible. Every service gets its own cgroup, and you can set limits on that cgroup directly from the unit file.

This is the answer to the shared hosting problem I described in the SysV-init section. You don’t need a third-party tool like CloudLinux. The kernel already has the mechanism, and Systemd exposes it cleanly.

CPU

CPUQuota=50%      # This service can use at most 50% of a single CPU core
CPUWeight=100     # Relative scheduling weight. Default is 100. Higher = more CPU time when contended.

CPUQuota is a hard cap. Set it to 200% on a 4-core machine to allow bursting across two cores. CPUWeight is more nuanced - it only matters when the CPU is actually under contention. A service with CPUWeight=200 gets roughly double the CPU time of a service with CPUWeight=100 when both are competing.

Memory

MemoryMax=512M     # Hard limit. The process will be killed (OOM) if it exceeds this.
MemoryHigh=400M    # Soft limit. Systemd starts throttling memory allocation above this.
MemorySwapMax=0    # Disallow swap usage entirely.

MemoryHigh is the more interesting one. Instead of hard-killing the process, Systemd applies memory pressure - making allocations slower - to encourage the application to free memory. This gives well-behaved applications a chance to recover before hitting the hard limit.

MemorySwapMax=0 is something I enable for latency-sensitive services. Swap is death for anything that needs predictable response times. If the service can’t fit in RAM, I’d rather it crash and alert than limp along while swapping.

I/O

IOWeight=50                                # Lower I/O priority than default (100)
IOReadBandwidthMax=/dev/sda 50M            # Cap read throughput at 50MB/s
IOWriteBandwidthMax=/dev/sda 20M           # Cap write throughput at 20MB/s

I/O limits are particularly relevant for backup jobs, log processors, and anything that does bulk data movement. A poorly written backup script can saturate your disk and cause everything else on the host to crawl. Setting IOWeight low, or capping the bandwidth, keeps it from doing that.

Tasks

TasksMax=64    # Maximum number of processes and threads this service can spawn

This is your fork bomb protection. A service that goes haywire and forks uncontrollably will hit this limit and stop. The default on most systems is fairly generous (around 15% of the total system limit), but for services that should only be running a handful of threads, setting it explicitly makes the limit obvious and intentional.

Monitoring Live Resource Usage

Setting limits is one thing. Actually watching what your services consume is another:

systemctl status myapp.service     # Shows current CPU and memory from the cgroup
systemd-cgtop                      # Real-time resource usage by cgroup, like `top` but per-service

systemd-cgtop is one of those tools I wish more people knew about. It gives you a live view of which services are consuming CPU, memory, and I/O, sorted by usage. When something is wrong on a host and you’re not sure what’s consuming resources, this (and, okay.. htop..) is the first thing I open.

Debugging Systemd in Production

Something is broken. A service isn’t starting, or it’s starting and then dying, or it started successfully but isn’t behaving correctly. Here’s the workflow I follow.

Step 1 - Get the Current Status

systemctl status myapp.service

This gives you: the current state (active, failed, activating), the last few log lines, the PID, and the cgroup. It’s often enough to spot the problem immediately. Look for the exit code if the service failed - code=exited, status=1/FAILURE tells you something specific, code=killed, signal=SEGV tells you something else entirely.

Step 2 - Check the Full Logs

journalctl -u myapp.service -b          # All logs from this boot
journalctl -u myapp.service -b -1       # All logs from the previous boot (useful after a crash/reboot)
journalctl -u myapp.service -f          # Follow logs in real time
journalctl -u myapp.service -n 100      # Last 100 lines

The -b flag scopes to the current boot. Without it, you’ll see logs from every previous boot of the service, which can be overwhelming. -b -1 is particularly useful if the service is crashing during startup and rebooting the server. The current boot won’t have much, but the previous one will have the full failure.

Step 3 - Filter by Severity

journalctl -u myapp.service -p err      # Only errors
journalctl -p err -b                    # All errors from all services, this boot

When you’re looking for a needle in a haystack, filtering by priority helps. Levels follow syslog convention: emerg, alert, crit, err, warning, notice, info, debug.

Step 4 - Validate the Unit File

If the service isn’t starting at all, check whether the unit file itself is valid:

systemd-analyze verify /etc/systemd/system/myapp.service

This catches syntax errors, missing dependencies, and unit files that reference executables which don’t exist. It’s the first thing I run after writing or modifying a unit file.

Step 5 - Check the Dependency Tree

If the service is failing because something it depends on isn’t ready:

systemctl list-dependencies myapp.service           # Full dependency tree
systemctl list-dependencies myapp.service --failed  # Only the failed dependencies

Step 6 - Understand Boot Timing

If something is timing out or starting in the wrong order:

systemd-analyze blame                          # How long each unit took to start
systemd-analyze critical-chain myapp.service   # The dependency chain responsible for myapp's start time
systemd-analyze plot > boot.svg                # Full visual timeline of the boot (open in a browser)

systemd-analyze plot generates an SVG timeline of the entire boot sequence. It’s one of the most useful diagnostics tools in the Systemd toolkit, and almost nobody uses it.

Making Changes Without Overwriting Unit Files

If you need to adjust a unit file that was installed by a package manager, don’t edit the file directly. Upstream updates will overwrite your changes. Use a drop-in override instead:

systemctl edit myapp.service

This opens an editor and creates /etc/systemd/system/myapp.service.d/override.conf. Only the directives you set here will override the original unit file - everything else is inherited. After any change, always reload:

systemctl daemon-reload
systemctl restart myapp.service

Useful journalctl Filters

journalctl --since "2026-01-15 14:00:00" --until "2026-01-15 15:00:00"   # Time range
journalctl -u myapp -o json-pretty                                         # JSON output (for log aggregation pipelines)
journalctl -u myapp --no-pager | grep "ERROR"                              # Plain text grep

Patterns I Like and Patterns I Avoid

After running Systemd in production for a few years, I’ve developed opinions. Here are the ones I feel strongly enough about to write down.

Patterns I Like

Drop-in overrides instead of editing unit files. Whenever I need to customize a unit file installed by a package, I use systemctl edit to create a drop-in override. This means package updates don’t blow away my changes, and the diff between the original and my customization is explicit and reviewable.

ExecStartPre= for config validation. If the application supports a config check flag, I wire it into ExecStartPre=. A misconfiguration causes a clean failure before the service even attempts to start, rather than a confusing runtime crash ten seconds in. This has caught more issues in deployment than I can count.

Dedicated service accounts with no login shell. Every service gets its own user created with --system --no-create-home --shell /usr/sbin/nologin. No exceptions. It takes thirty seconds and it meaningfully limits the blast radius if something goes wrong.

Persistent=true on all timers. If the server is down when a scheduled job was supposed to run, I want it to run when the server comes back up. The only timers where I deliberately don’t set this are ones where running late would cause more damage than not running at all - which is rare.

EnvironmentFile= with restrictive permissions. Secrets go in an environment file with chmod 640 and chown root:serviceuser. The unit file itself is readable by anyone. The environment file isn’t.

Patterns I Avoid

Restart=always without rate limiting. Restart=always combined with a bug that causes an immediate crash is a tight restart loop that will pin a CPU core. Always pair it with StartLimitBurst= and StartLimitIntervalSec=. Better yet, use Restart=on-failure unless you have a specific reason to restart after a clean exit.

After=network.target when you need network-online.target. I’ve debugged this specific issue in production more than once. The service starts, tries to connect to the database, fails because the route isn’t configured yet, and then either crashes or goes into a retry loop. If you need the network, say so properly.

Secrets in Environment= directives. Anything you put in Environment= in a unit file is visible in systemctl show myapp.service and potentially in process listings. Use EnvironmentFile= and control the file permissions instead.

KillMode=none. This tells Systemd to leave all processes alone when stopping the service. It’s occasionally useful for services that manage their own shutdown, but it’s a footgun. If you use it, you are responsible for ensuring every process in that cgroup actually stops.

Overusing Requires= where Wants= is sufficient. If a dependency is truly optional; a cache, a metrics endpoint, a secondary store, use Wants=. Using Requires= means your entire service goes down if that dependency goes down. That’s often not what you actually want.

When Not to Use Systemd

I’ve spent most of this write-up explaining why Systemd is good and how to use it properly. But it’s not the right tool for every situation. Here’s where I’d reach for something else.

You’re Running Everything in Containers

If your services run in Docker, Kubernetes, or a similar container environment, the container runtime is already your process manager. Systemd inside a container is almost always the wrong answer. There are edge cases. Running Systemd as PID 1 inside a container is possible and sometimes done for testing, but for production, let your orchestrator handle restarts, health checks, and resource limits. It’s what it’s built for.

The one exception: if you’re running a VM-based workload (not containers) and managing the guest from inside, Systemd is still the right choice.

Truly Simple, One-Off Tasks

Not everything needs a service. A script that runs once during a deployment, a database migration, a file cleanup job with no timing requirements; these don’t need a unit file. They need to be called from your deployment tooling directly. Turning every script into a Systemd unit adds overhead without adding value.

Resource-Constrained Systems

On embedded systems or environments with very limited RAM, Systemd’s footprint can be too large. runit, s6, and OpenRC are all legitimate alternatives that provide process supervision with a much smaller memory footprint. Alpine Linux, for example, uses OpenRC partly for this reason.

The common thread: Systemd makes most sense when you’re managing long-running services on a standard Linux server, especially when those services have dependencies on each other, need reliable restart behaviour, and benefit from structured logging. Outside of that context, evaluate the alternatives on their own merits.

Final Thoughts

The debate around Systemd has always felt slightly off to me. The arguments against it - that it violates the UNIX philosophy, that it does too much, that it’s opaque - are all technically true. But they miss the point of what it replaced.

SysV-init was simple, yes. It was also fragile, inconsistent, and full of gaps that every distribution, every package maintainer, and every system administrator filled in differently. The ecosystem of init scripts, watchdog crons, PID file conventions, and ad-hoc logging setups that grew around it wasn’t simple at all. It was an accretion of workarounds.

Systemd made a different bet: that consolidating these concerns into a coherent, well-specified system, even a complex one, would be better than the fragmented landscape it replaced. On balance, I think it was right.

What I’ve seen in practice is that most engineers are using maybe twenty percent of what Systemd offers. They start and stop services, check status occasionally, maybe look at logs. The rest; sandboxing, resource control, socket activation, timers, dependency graphs… …Sits unused because nobody told them it was there.

That gap is expensive. Services run with more privilege than they need. Jobs run without rate limiting and occasionally bring hosts to their knees. Debugging takes hours because there’s no structured logging to query. Problems that Systemd would have caught or prevented cause incidents instead.

Learning Systemd properly is a relatively small investment. The documentation is thorough, the tooling is good, and most of the important concepts build on each other in a logical way. What you get in return - reliable services, observable failures, a consistent security posture, and the ability to actually reason about what’s running on your hosts - is worth it.

You’re already running it. You might as well understand it.