Back to blog

SLI vs SLO vs SLA: differences and how to set them up

· 7 min read

In brief: SLI is what you measure, SLO is what target you want to reach, SLA is the contractual commitment. These terms from the Google SRE book are among the most used phrases in operating web services - and often confused.

In brief: SLI is what you measure, SLO is what target you want to reach, SLA is the contractual commitment. These terms from the Google SRE book are among the most used phrases in operating web services - and often confused.

SLI - Service Level Indicator

A concrete metric by which you quantify service reliability. Examples:

  • % of requests that ended in HTTP 2xx or 3xx in the last 30 days
  • % of requests with response time under 500 ms
  • % of correctly delivered emails (delivered, not bounced)
  • Ratio of successful payment transactions to all attempts

Key properties of a good SLI:

  • Measurable - there is a concrete method of data collection
  • User-relevant - reflects real customer experience, not an internal technical metric
  • Specific - "uptime" is too vague; "% of successful requests to /api/v1/orders in a 5-minute window" is an SLI

SLO - Service Level Objective

An internal target for what value the SLI should reach. Expressed as a percentage over a time window.

Examples:

  • "99.9 % of requests to /api/orders should end with HTTP 2xx in 30 days"
  • "95 % of requests should have response time under 200 ms in 7 days"
  • "99.5 % of payment transactions will succeed within a calendar month"

SLO is higher than SLA so you have a buffer. If SLA says 99.9 %, the internal SLO should be e.g. 99.95 % - so you have a reserve for unexpected incidents.

SLA - Service Level Agreement

A contractual commitment to customers. It defines what happens when you don't meet the SLO - typically:

  • Service credits - you refund a portion of the monthly fee (10-50% depending on the magnitude of the breach)
  • Termination rights - the customer can terminate the contract without penalty
  • Reporting obligation - you must publish a postmortem and uptime reports

SLA has legal consequences. SLO is an internal target.

Error budget

A key SRE concept: the downtime you can afford without violating the SLO.

Example: SLO = 99.9 % uptime in 30 days. That's 0.1 % of allowed downtime. 0.1 % of 30 days = 43 minutes per month. This is your error budget.

Practical implications:

  • If you've had 35 min downtime in the month, 8 min remain before "SLO breach". The team should be conservative with further deploys.
  • If you've had 5 min downtime, you have 38 min budget for risks - you can make more aggressive changes, A/B tests, experiments.
  • Error budget resolves the conflict between speed of innovation (dev team) and stability (ops team). Both track the same number.

Practical example: e-commerce API

SLI: % of HTTP requests to POST /api/checkout that ended in 2xx, measured in 1-minute buckets over the last 30 days.

SLO: ≥ 99.9 % successful requests in a rolling 30-day window.

SLA (for Enterprise customers):

  • ≥ 99.5 % uptime in a calendar month
  • At 99.0-99.5 % = 10% credit of monthly fee
  • At 95.0-99.0 % = 25% credit
  • At < 95.0 % = 50% credit + right to terminate the contract

Error budget: 99.9 % SLO means 43 min downtime / month on budget. SLA gives an even larger buffer before economic penalty.

Summary: table

Term What it is For whom
SLI Concrete reliability metric Engineering team
SLO Internal target for SLI Engineering + product
SLA Contractual commitment Customer + legal
Error budget Downtime you can afford before SLO breach Engineering risk management

Practical mistakes

  • SLO too ambitious. 99.99 % requires active-active redundancy in multiple regions. Unrealistic for a small company.
  • Only uptime SLO. The web can be "up" and still unusable. Add a latency SLO and an error rate SLO.
  • SLA without automatic measurement. A manually calculated SLA report is untrustworthy. Invest in automated uptime tracking.
  • SLO without consequences. If nobody cares about an SLO breach, nobody cares about it. Link to deploy freeze, on-call escalation, etc.

Conclusion

The SLI/SLO/SLA framework isn't paper bureaucracy - it's the language by which the engineering team communicates with business stakeholders about reliability. Without these terms the discussion about stability becomes subjective ("our web is unstable"). With them it's numerical ("in the last 30 days we achieved 99.87 % SLI, which is under our 99.9 % SLO - here's the action plan").

Measure SLI in real time

ePulz.io provides a historical uptime record with 30/90/365-day rollup. Foundation for SLO reporting.

Start monitoring →


Try ePulz.io free - 7 days, no credit card needed.

Create account