Byte Journeys

Join me as I share insights and discoveries from my journey in the world as a software engineering manager by day and tinkerer by night.

Service Reliability Math - and what it means

16 January 2024

In the context of the cloud, you might read about AWS’s service availability or service durability (read here about Amazon S3’s availability and durability). But what does it mean if a service has “five 9’s” (or 99.999%) uptime? In terms of availability we can translate that into time easily, so here is a breakdown of it:

Availability Downtime (Yearly)
99.00000% 3d 15h 39m
99.90000% 8h 45m 56s
99.99000% 52m 35s
99.99900% 5m 15s
99.99990% 31 s
99.99999% 3s

Reading in the AWS docs, S3 has an availability of 99.99%. This sounds great at first, but I am unsure if it still sounds great when you talk about 52 minutes and 35 seconds of downtime per year. Better to not take that service’s availability for granted in your code and prepare some fallback if fetching objects from a specific store fails!

A question you can ask yourself: How often is my team’s system up and running? And do I have the data available to find out?

Did you know that AWS will refund parts of your bill if the SLA is not agreed? In the case of S3, they will refund parts of the bill if the availability falls below 99.9%.

I talked about availability in this post and translated that into its time representation. When S3 talks about durability, this translates into bytes being lost in a bucket. The more data you have, the higher are your chances that some bytes get lost over time.