Member-only story
Part 1: The Setup — Building a 5–nines (99.999%) Available Platform
As engineers, we strive to build platforms that are available at all times.
It is exciting to work on mission-critical platforms and solve system design issues that balance throughput, availability, and cost of keeping the lights on.
The most pessimistic you are about the availability of infrastructure, your applications, and integrations — the more it is going to cost you to meet SLOs and associated SLIs of each of your application components.
Refer to my article on a quick introduction to SLAs, SLOs, and SLIs and understand the math and documentation behind building an SLA (technically SLO) table for your application.
In this 5-part series, we are going to look at the anatomy of a 5-nines application platform (a typical one), and understand what it takes to build one and constraints in achieving Maximum Availability.
Before Cloud — i.e., in BC times, only large corporations attempted to think about developing highly available platforms, given the capital expense, and need for enterprise licenses such as Oracle HA, SQL Server Replication, etc., and the luxury of having multiple data centers separated by 50 and up to 250 miles, depending on the industry and associated regulations.