How to monitor cron jobs: 5 patterns with heartbeat checks
· 7 min read
Cron jobs fail silently. Five heartbeat patterns - from dead man's switch to exit-code wrapper to grace windows - so you hear about failures first.
Cron is brilliant at running things on schedule and terrible at telling you when it does not. A backup that stops running rarely throws an error you will see. It just quietly does nothing, and you find out weeks later when you need the backup and it is not there. This is the silent failure problem, and it is why every scheduled job worth caring about needs a heartbeat check.
A heartbeat (or "dead man's switch") flips the logic around. Instead of alerting when something is wrong, your job pings a monitoring endpoint when things go right. If the expected ping does not arrive on time, the monitor alerts. No ping, no news, big problem.
Below are five patterns, from simplest to most robust. All assume you have a heartbeat URL from your monitoring tool.
Pattern 1: The classic dead man's switch
The simplest possible setup. Your job hits a URL after it finishes. If the monitor does not hear from it within the expected window, it alerts.
# Daily backup at 02:30, ping on completion
30 2 * * * /usr/local/bin/backup.sh && curl -fsS https://your-monitor/ping/abc123
The && matters: curl only runs if backup.sh exits 0 (success). If the backup fails, no ping fires, and the monitor catches it. The -fsS flags make curl fail quietly on HTTP errors but still show real problems.
Weakness: it only confirms the script ran and exited zero. If the script swallows its own errors and exits 0 anyway, you get a false "all good."
Pattern 2: Push only on genuine success
Make the success signal explicit instead of relying on the shell's exit chaining. This is clearer in scripts with multiple steps.
#!/bin/bash
set -euo pipefail # fail loudly on any error
run_the_backup
verify_the_backup # actually check the output exists and is sane
curl -fsS https://your-monitor/ping/abc123 # only reached if all above passed
set -euo pipefail means any failed command aborts the script before the ping. Adding a real verification step (does the backup file exist, is it bigger than zero bytes) closes the "exited zero but did nothing" hole from Pattern 1.
Pattern 3: The exit-code wrapper
When you cannot edit the job itself - a third-party binary, a vendor script - wrap it. The wrapper signals start, runs the job, and reports the exit code.
#!/bin/bash
URL=https://your-monitor/ping/abc123
curl -fsS "$URL/start" # signal "I have started"
/opt/vendor/report-generator # the job you do not control
EXIT=$?
curl -fsS "$URL/$EXIT" # report the exit code
Signalling start lets the monitor also measure run duration and detect a job that started but never finished (hung). Reporting the exit code lets it distinguish "ran fine" from "ran and failed."
Pattern 4: Conditional ping (only ping when the work mattered)
Some jobs run on schedule but only do work sometimes - a queue processor that often finds the queue empty. You do not want a "missing heartbeat" alert when the job legitimately did nothing. Ping only when real work happened, and widen the schedule accordingly.
#!/bin/bash
PROCESSED=$(process_queue)
if [ "$PROCESSED" -gt 0 ]; then
curl -fsS "https://your-monitor/ping/abc123?count=$PROCESSED"
fi
Configure the monitor's expected period to the longest realistic gap between real runs. The trade-off is sensitivity: a longer window means slower detection, so use this only when intermittent work is genuinely expected.
Pattern 5: Jitter and grace windows
Real schedules drift. A job set for 02:00 might start at 02:00:04 because of load, and a daily job's runtime varies. If your monitor expects a ping at exactly the interval, normal jitter will page you for nothing.
The fix is a grace period: tell the monitor to wait an extra buffer past the expected time before alerting. A daily backup that usually takes 8 minutes might use a 20-minute grace window. You also want some jitter in scheduling itself when you have many jobs, so they do not all hammer a resource at the same second:
# Add a small random delay (0-59s) so 02:00 jobs spread out
0 2 * * * sleep $((RANDOM % 60)) && /usr/local/bin/backup.sh && curl -fsS https://your-monitor/ping/abc123
Grace windows are the difference between a monitor you trust and one you mute after the third false alarm. Set the grace to comfortably exceed the job's worst realistic runtime, not its average.
Quick reference
| Pattern | Best for | Catches |
|---|---|---|
| Dead man's switch | Simple, single-command jobs | Job did not run / exited non-zero |
| Push on success | Multi-step scripts | Silent "exited zero but did nothing" |
| Exit-code wrapper | Jobs you cannot edit | Hangs, real exit codes, duration |
| Conditional ping | Intermittent-work jobs | Failures without false "missing" alerts |
| Jitter and grace | Anything with variable runtime | Eliminates false alarms from drift |
Putting it into practice
Start with Pattern 1 on your most important job today - a backup, a billing run, a data sync. Add verification (Pattern 2) and a grace window (Pattern 5) before you rely on it. For a deeper walkthrough of heartbeat configuration, see our guide on monitoring cron jobs with heartbeats.
If your job talks to a service over a specific port, you can sanity-check reachability first with the free port checker. And when you are ready to monitor scheduled jobs alongside your websites and APIs in one place, see how ePulz.io uptime monitoring works - the 7-day trial is free, no card needed.
Try ePulz.io free - 7 days, no credit card needed.
Create account