Back to blog

False-positive outages: how multi-region monitoring works

· 6 min read

In brief: The fastest way for your team to stop paying attention to uptime alerts is to send false-positives. A multi-region cross-check reduces noise by marking an outage as confirmed only when reported by multiple geographically separated probes - not one network with bad peering.

In brief: The fastest way for your team to stop paying attention to uptime alerts is to send false-positives. A multi-region cross-check reduces noise by marking an outage as confirmed only when reported by multiple geographically separated probes - not one network with bad peering.

Why single-region monitoring lies

Classic monitoring has one observation position (one server or cloud region). When this probe doesn't get a response, it reports an outage. But the cause could be:

  • A problem in the probe's own network (route flap, peering issue of their provider)
  • A short-term DNS glitch on the probe's side
  • A geographically limited outage (CDN edge in one country went down)
  • Rate limiting or IP block on your infrastructure's side

From the real users' perspective the web may be completely fine - just unavailable for one specific monitoring host.

Consequence: alert fatigue

A team receiving 3 alerts a week about an "outage", 2 of which are false-positives, gradually stops reacting. When a real outage comes, the reaction is delayed or completely missed. This is alert fatigue - a psychologically verified phenomenon.

The goal is signal-to-noise ratio. Better 1 alert a month and always real, than 10 alerts of which 7 are noise.

Multi-region pattern: consensus from N probes

The principle:

  1. You have N geographically distributed probes (e.g. EU-Central, US-East, Asia-Pacific).
  2. In each interval all probes test the endpoint in parallel.
  3. You merge results: outage = confirmed if reported by M of N probes (typically M = 2 or more).
  4. A single-region failure doesn't escalate - even if one probe says "down", the others say "up", the system stays in UP state.

This is called consensus algorithm, similar to Raft or Paxos - the decision is made by majority.

Practical setup

In the ePulz.io admin panel multi-region is turned on with one switch and configured via:

  • Active regions - list of workers, typically 3-5
  • Consensus threshold - how many regions must say DOWN (default: 2)
  • Worker token - shared secret between main server and workers for auth

On each check the main server calls all workers in parallel via HTTP API. The worker performs a local HTTP/SSL/TCP/DNS test and returns the result. The main counts consensus and only escalates an alert when the threshold is exceeded.

Trade-offs

Pros:

  • Drastically fewer false-positive alerts
  • Geographic visualization - you see from which regions the web doesn't work
  • Detection of regional outages (Cloudflare PoP problem, ISP route issue)

Cons:

  • Slightly longer latency from real outage to alert (waiting for consensus from multiple sources)
  • Higher demands on infrastructure / plan price
  • Worker availability - if half the workers are themselves down, the threshold may not be reachable (solution: dynamic threshold = M of currently live probes)

Consensus calculation example

Configuration: 4 probes (Frankfurt, Amsterdam, Virginia, Singapore), threshold = 2.

Scenario FRA AMS IAD SIN Alert?
Everything OK UP UP UP UP No
Singapore has route problem UP UP UP DOWN No (only 1)
EU region down DOWN DOWN UP UP Yes (2≥2)
Global outage DOWN DOWN DOWN DOWN Yes

How to deploy your own workers

A worker is a simple service (HTTP POST endpoint /check) that performs a test and returns the result. ePulz.io supports your own workers via WireGuard tunnel - so workers can run on any VPS without a public IP and communicate with the main server through an encrypted tunnel.

Practical configuration takes ~10 minutes per worker (apt install wireguard, copy peer config, systemctl enable). With this you get truly independent observation positions - not all in the Frankfurt datacenter.

Conclusion

Multi-region monitoring isn't a marketing buzzword. It's a concrete engineering pattern (quorum / consensus) that moves monitoring from "I see what one network position sees" to "I see what the internet sees". For business-critical applications, this is the standard today.

Eliminate false-positive alerts

Multi-region cross-check in basic plans (not just Enterprise). 7 days free.

Start monitoring →


Try ePulz.io free - 7 days, no credit card needed.

Create account