Back to blog

How to monitor cron jobs: 5 patterns with heartbeat checks

· 7 min read

Cron jobs fail silently. Five heartbeat patterns - from dead man's switch to exit-code wrapper to grace windows - so you hear about failures first.

How to monitor cron jobs: 5 patterns with heartbeat checks

Cron is brilliant at running things on schedule and terrible at telling you when it does not. A backup that stops running rarely throws an error you will see. It just quietly does nothing, and you find out weeks later when you need the backup and it is not there. This is the silent failure problem, and it is why every scheduled job worth caring about needs a heartbeat check.

A heartbeat (or "dead man's switch") flips the logic around. Instead of alerting when something is wrong, your job pings a monitoring endpoint when things go right. If the expected ping does not arrive on time, the monitor alerts. No ping, no news, big problem.

Below are five patterns, from simplest to most robust. All assume you have a heartbeat URL from your monitoring tool.

Pattern 1: The classic dead man's switch

The simplest possible setup. Your job hits a URL after it finishes. If the monitor does not hear from it within the expected window, it alerts.

# Daily backup at 02:30, ping on completion
30 2 * * * /usr/local/bin/backup.sh && curl -fsS https://your-monitor/ping/abc123

The && matters: curl only runs if backup.sh exits 0 (success). If the backup fails, no ping fires, and the monitor catches it. The -fsS flags make curl fail quietly on HTTP errors but still show real problems.

Weakness: it only confirms the script ran and exited zero. If the script swallows its own errors and exits 0 anyway, you get a false "all good."

Pattern 2: Push only on genuine success

Make the success signal explicit instead of relying on the shell's exit chaining. This is clearer in scripts with multiple steps.

#!/bin/bash
set -euo pipefail   # fail loudly on any error
run_the_backup
verify_the_backup    # actually check the output exists and is sane
curl -fsS https://your-monitor/ping/abc123   # only reached if all above passed

set -euo pipefail means any failed command aborts the script before the ping. Adding a real verification step (does the backup file exist, is it bigger than zero bytes) closes the "exited zero but did nothing" hole from Pattern 1.

Pattern 3: The exit-code wrapper

When you cannot edit the job itself - a third-party binary, a vendor script - wrap it. The wrapper signals start, runs the job, and reports the exit code.

#!/bin/bash
URL=https://your-monitor/ping/abc123
curl -fsS "$URL/start"                  # signal "I have started"
/opt/vendor/report-generator            # the job you do not control
EXIT=$?
curl -fsS "$URL/$EXIT"                   # report the exit code

Signalling start lets the monitor also measure run duration and detect a job that started but never finished (hung). Reporting the exit code lets it distinguish "ran fine" from "ran and failed."

Pattern 4: Conditional ping (only ping when the work mattered)

Some jobs run on schedule but only do work sometimes - a queue processor that often finds the queue empty. You do not want a "missing heartbeat" alert when the job legitimately did nothing. Ping only when real work happened, and widen the schedule accordingly.

#!/bin/bash
PROCESSED=$(process_queue)
if [ "$PROCESSED" -gt 0 ]; then
  curl -fsS "https://your-monitor/ping/abc123?count=$PROCESSED"
fi

Configure the monitor's expected period to the longest realistic gap between real runs. The trade-off is sensitivity: a longer window means slower detection, so use this only when intermittent work is genuinely expected.

Pattern 5: Jitter and grace windows

Real schedules drift. A job set for 02:00 might start at 02:00:04 because of load, and a daily job's runtime varies. If your monitor expects a ping at exactly the interval, normal jitter will page you for nothing.

The fix is a grace period: tell the monitor to wait an extra buffer past the expected time before alerting. A daily backup that usually takes 8 minutes might use a 20-minute grace window. You also want some jitter in scheduling itself when you have many jobs, so they do not all hammer a resource at the same second:

# Add a small random delay (0-59s) so 02:00 jobs spread out
0 2 * * * sleep $((RANDOM % 60)) && /usr/local/bin/backup.sh && curl -fsS https://your-monitor/ping/abc123

Grace windows are the difference between a monitor you trust and one you mute after the third false alarm. Set the grace to comfortably exceed the job's worst realistic runtime, not its average.

Quick reference

Pattern Best for Catches
Dead man's switch Simple, single-command jobs Job did not run / exited non-zero
Push on success Multi-step scripts Silent "exited zero but did nothing"
Exit-code wrapper Jobs you cannot edit Hangs, real exit codes, duration
Conditional ping Intermittent-work jobs Failures without false "missing" alerts
Jitter and grace Anything with variable runtime Eliminates false alarms from drift

Putting it into practice

Start with Pattern 1 on your most important job today - a backup, a billing run, a data sync. Add verification (Pattern 2) and a grace window (Pattern 5) before you rely on it. For a deeper walkthrough of heartbeat configuration, see our guide on monitoring cron jobs with heartbeats.

If your job talks to a service over a specific port, you can sanity-check reachability first with the free port checker. And when you are ready to monitor scheduled jobs alongside your websites and APIs in one place, see how ePulz.io uptime monitoring works - the 7-day trial is free, no card needed.

Share: Link copied

Try ePulz.io free - 7 days, no credit card needed.

Create account