SLIs, SLOs, and SLAs

Agreements of service, terms of services, contracts, and many other types of agreements are designed so that two parties in agreement with one another can draw out that agreement and are then beholden to it. You need a contract when one party pays another for a service, when two parties exchange services, when one party agrees to a user agreement drawn up by the other party (ever read one of those?), and for a lot of other reasons.

Let’s break down what each of these are:

  • Service level indicators (SLIs): These are metrics that can be used to numerically define the level of service that is being provided by a product. For instance, if you were to run a website, you could use the uptime (the amount of time the website is available for service) as an SLI.
  • Service level objectives (SLOs): These provide a specific number to the aforementioned SLIs. That number is an objective that the DevOps team must meet for their client. Going back to the previous example in the SLI definition: if uptime is the SLI, then having an uptime of 99% a month is the SLO. Typically, a month has 30 days, which is 720 hours, so the website should have a minimum uptime of 712.8 hours in that month with a tolerable downtime of 7.2 hours.
  • Service level agreements (SLAs): These are contracts that enforce an SLO. In an SLA, there is a defined SLO (hope you’re keeping up now) for an SLI which must be achieved by the DevOps team. If this SLA is not fulfilled, the party that contracted the DevOps team is entitled to some compensation. Concluding that example, if there is an SLA for that website with an SLO of 99% uptime, then that is defined in the agreement and that is the metric that needs to be fulfilled by the DevOps team. However, most SLAs have more than one SLO.

To put it simply, SLIs (are measured for) -> SLOs (are defined in) -> SLAs.

One of the more prominent examples of an SLA that the AWS team likes to show off is the 11 9s (99.999999999%) of durability for Amazon’s Secure Storage Service (S3) (other cloud object storage services do the same as well). This means that any S3 bucket loses one object every 10,000 years. It also has a 99.9% availability for its standard-tier SLA. This is equivalent to being down for 44 minutes out of a calendar month of 30 days.

Now, these three abbreviations are related to availability, but in an ancillary way. The next two abbreviations will be much more focused on what availability actually entails contractually and goal-wise.