Multi-region monitoring: how to eliminate false-positive outages

False alarms train a team to ignore alerts. A multi-region cross-check only flags an outage once several independent probes confirm it.

Why single-region monitoring lies

Classic monitoring has a single observation point (one server or cloud region). When that probe gets no response, it reports an outage. The cause, however, can be any of the following:

A problem in the probe's own network (route flap, a peering issue at their provider)
A short-lived DNS glitch on the probe's side
A geographically limited outage (a CDN edge in one country went down)
Rate limiting or an IP block on your infrastructure's side

From real users' perspective, the site may be perfectly fine - just unreachable for one specific monitoring host.

The consequence: alert fatigue

A team that gets 3 "outage" notifications a week, 2 of which are false positives, will gradually stop reacting. When a real outage finally arrives, the response is delayed or nobody notices. This phenomenon is called alert fatigue and it is well documented in the devops literature.

The goal is the best possible signal-to-noise ratio. It is better to get 1 notification a month that is always real than 10 notifications of which 7 are noise.

The multi-region pattern: consensus from N probes

The principle:

You have 3 worker nodes in 3 cities (primary in Liptovský Hrádok, eu2 in Liptovský Mikuláš, eu1 in Bratislava). The default threshold = 2 of 3, which gives a real consensus (not unanimous). The architecture supports any number of nodes; when you add more, they join the existing consensus mechanism.
At each interval all probes test the endpoint in parallel.
You merge the results: an outage is confirmed if M of N probes report it (typically M = 2 or more).
A single probe failure does not trigger an alarm - if one probe reports "down" but the others report "up", the system stays in the UP state.

This is the so-called consensus algorithm, similar to Raft or Paxos - the decision is made by a majority of votes.

Practical setup

In the ePulz.io admin panel, multi-region is enabled with a single toggle and configured via:

Active regions - the list of workers, typically 3-5
Consensus threshold - how many regions must say DOWN (default: 2)
Worker token - a shared secret between the main server and the workers for verification

On each check, the main server reaches all workers in parallel over the HTTP API. A worker runs a local HTTP, TCP or ping test and returns the result. The main server evaluates consensus and escalates the alert only once the threshold is exceeded.

Trade-offs

Pros:

Drastically fewer false alarms
Geographic visualization - you see which regions the site is failing from
Detection of regional outages (a Cloudflare PoP problem, faulty routing at an ISP)

Cons:

Slightly longer latency between a real outage and the alert (it waits for consensus from multiple sources)
Higher infrastructure demands and a higher pricing plan
Worker availability - if half the workers are themselves unreachable, the threshold may not be attainable (the fix is a dynamic threshold = M of the currently live probes)

A consensus calculation example

Configuration of 3 actually deployed worker nodes: primary in Liptovský Hrádok (SK), eu2 in Liptovský Mikuláš (SK), eu1 in Bratislava (SK), threshold = 2.

Scenario	primary (Liptov)	eu1 (Bratislava)	eu2 (Liptov)	Alert?
Everything OK	UP	UP	UP	No
BGP flap between Liptov and your hosting	DOWN	UP	UP	No (1 of 3)
HW failure of the primary machine	DOWN	UP	DOWN	Yes (2 of 3)
A real outage of your server	DOWN	DOWN	DOWN	Yes

How to deploy your own workers

A worker is a simple service that accepts check tasks over HTTPS, runs the test and returns the result. ePulz.io supports custom workers over a WireGuard tunnel, so they can run on any VPS without a public IP and communicate with the main server over an encrypted tunnel.

Configuring a single worker takes about 10 minutes in practice (apt install wireguard, copy the peer config, systemctl enable). You get genuinely independent observation points that combine geographic diversity (different cities) with hardware redundancy (different machines in the same city).

Conclusion

Multi-region monitoring is not just a marketing buzzword. It is a concrete engineering pattern (quorum, or consensus) that moves monitoring from "I see what one network position sees" to "I see what the internet sees". For critical business applications it is the standard today.

Eliminate false-positive alerts

Multi-region cross-check in the base plans (centrally managed). 7 days free.

Start monitoring →