False-positive outages: how multi-region monitoring works
· 6 min read
In brief: The fastest way for your team to stop paying attention to uptime alerts is to send false-positives. A multi-region cross-check reduces noise by marking an outage as confirmed only when reported by multiple geographically separated probes - not one network with bad peering.
In brief: The fastest way for your team to stop paying attention to uptime alerts is to send false-positives. A multi-region cross-check reduces noise by marking an outage as confirmed only when reported by multiple geographically separated probes - not one network with bad peering.
Why single-region monitoring lies
Classic monitoring has one observation position (one server or cloud region). When this probe doesn't get a response, it reports an outage. But the cause could be:
- A problem in the probe's own network (route flap, peering issue of their provider)
- A short-term DNS glitch on the probe's side
- A geographically limited outage (CDN edge in one country went down)
- Rate limiting or IP block on your infrastructure's side
From the real users' perspective the web may be completely fine - just unavailable for one specific monitoring host.
Consequence: alert fatigue
A team receiving 3 alerts a week about an "outage", 2 of which are false-positives, gradually stops reacting. When a real outage comes, the reaction is delayed or completely missed. This is alert fatigue - a psychologically verified phenomenon.
The goal is signal-to-noise ratio. Better 1 alert a month and always real, than 10 alerts of which 7 are noise.
Multi-region pattern: consensus from N probes
The principle:
- You have N geographically distributed probes (e.g. EU-Central, US-East, Asia-Pacific).
- In each interval all probes test the endpoint in parallel.
- You merge results: outage = confirmed if reported by M of N probes (typically M = 2 or more).
- A single-region failure doesn't escalate - even if one probe says "down", the others say "up", the system stays in UP state.
This is called consensus algorithm, similar to Raft or Paxos - the decision is made by majority.
Practical setup
In the ePulz.io admin panel multi-region is turned on with one switch and configured via:
- Active regions - list of workers, typically 3-5
- Consensus threshold - how many regions must say DOWN (default: 2)
- Worker token - shared secret between main server and workers for auth
On each check the main server calls all workers in parallel via HTTP API. The worker performs a local HTTP/SSL/TCP/DNS test and returns the result. The main counts consensus and only escalates an alert when the threshold is exceeded.
Trade-offs
Pros:
- Drastically fewer false-positive alerts
- Geographic visualization - you see from which regions the web doesn't work
- Detection of regional outages (Cloudflare PoP problem, ISP route issue)
Cons:
- Slightly longer latency from real outage to alert (waiting for consensus from multiple sources)
- Higher demands on infrastructure / plan price
- Worker availability - if half the workers are themselves down, the threshold may not be reachable (solution: dynamic threshold = M of currently live probes)
Consensus calculation example
Configuration: 4 probes (Frankfurt, Amsterdam, Virginia, Singapore), threshold = 2.
| Scenario | FRA | AMS | IAD | SIN | Alert? |
|---|---|---|---|---|---|
| Everything OK | UP | UP | UP | UP | No |
| Singapore has route problem | UP | UP | UP | DOWN | No (only 1) |
| EU region down | DOWN | DOWN | UP | UP | Yes (2≥2) |
| Global outage | DOWN | DOWN | DOWN | DOWN | Yes |
How to deploy your own workers
A worker is a simple service (HTTP POST endpoint /check) that performs a test and returns the result. ePulz.io supports your own workers via WireGuard tunnel - so workers can run on any VPS without a public IP and communicate with the main server through an encrypted tunnel.
Practical configuration takes ~10 minutes per worker (apt install wireguard, copy peer config, systemctl enable). With this you get truly independent observation positions - not all in the Frankfurt datacenter.
Conclusion
Multi-region monitoring isn't a marketing buzzword. It's a concrete engineering pattern (quorum / consensus) that moves monitoring from "I see what one network position sees" to "I see what the internet sees". For business-critical applications, this is the standard today.
Eliminate false-positive alerts
Multi-region cross-check in basic plans (not just Enterprise). 7 days free.
Try ePulz.io free - 7 days, no credit card needed.
Create account