Global Infrastructure – Part-1: Regions & Availability Zones – AWS | Azure | Google Cloud

Code: C01-P1 | Author: Saransundar N | Applies To: AWS | Azure | Google Cloud

Understanding the global infrastructure offered by cloud providers is very important. This makes you plan to host your workloads for data compliance, and high availability, and even plan for disaster recovery. In this post, we will discuss the Regions and Availability Zones. Followed by the concepts, we will deal with 5 different interesting scenarios regarding Availability Zones. In the next post, we will see the “Criteria for choosing the right region”.

About Region:

The Region is a location where AWS or Azure provides you with service. In simple words, It is a collection of cloud resources in a geographic area. Geography is an area of the world that contains one or more regions. Cloud providers offer more regions that are isolated and independent from each other. For example, APAC Geo – India (Mumbai) as a region; US East – N.Virginia as a region.

*Regions’ presence across Cloud Providers*

# of Regions across Geographic area:

All three cloud providers have started spreading their wings across Geos – US, APAC, Europe, Africa, and the Middle East. US and APAC have more regions. The overall count includes Gov Cloud and the China region. The graph provides the count of regions or locations across Geographies offered by AWS, Azure, and Google Cloud. To get the complete list, click here.

What is interesting about hosting in multiple regions? What does it provide?

Majorly for compliance on data residency, handling operational failures which are fault tolerance, providing stability, and then highly resilient which means adapting well to changes, and very important is reducing latency for your end users.

Placing your resources in different regions provides an even higher degree of failure independence. Cloud providers should offer more regions or locations all over the world. However, you decide where to place your data.

Just think of it! The region has a collection of data centers to provide cloud services. The region contains a collection of Cloud resources like virtual machines, storage, and database services and houses them.

Features	AWS	Azure	GCP
Region	27	60+	35
Representation	US East(N. Virginia) us-east-1	Virginia East US 2	N. Virginia us-east4

Tab. Region or Location across cloud providers

Need and use of Availability Zones (AZ):

To organize the placement of data centers, the availability zones concept came into the picture. In simple words, it is a group of one or more data centers. Now you can say that each Region has multiple, isolated locations known as Availability Zones or Zones.

*Availability Zones (AZs) within a region*

The thing most needed for any business is service should be designed for high availability, we say it as “HA” in short. Imagine you are not able to browse any critical website or applications for a few minutes. Most of them knew the pain of the unavailability of any sites/apps. Similarly, when you want to host your application services, it should be built for HA. Hence availability zones are designed for it.

Some of the interesting facts about Zones are:

The zones are isolated from each other; in case of any one zone failure, you have other zones for support
The next question is how the Zones are isolated physically. Ideally, all the zones are chosen within a 100 km radius (i.e. approx 60 miles) but in the same region.
Provides high-bandwidth, low-latency networking, over fully redundant metro fiber providing high throughput
If an application is partitioned across zones, companies are better isolated and protected from issues such as power outages, lightning strikes, tornadoes, earthquakes, and more

AWS-Availability Zones:

AWS hosts all their regions with Availability zones AZs; Each region has a minimum of 3 availability zones.
In US East (Northern Virginia) Region has 6 AZs, US West (Oregon) Region, the Asia Pacific (Tokyo and Seoul) Region has 4 AZs and the rest of the other regions have 3 AZs.
In AWS, AZs are represented as AZ ID which refers to the physical location of the zone. Represented as region short code followed by ID. Ex: use1-az1 for us-east-1; aps1-az1 for ap-south-1
The logical way of representing Availability zones is region code followed by Zone ID. Ex: us-east-1a; us-east-1b; ap-south-1a
The physical locations of zones are logically mapped to the Availability Zones for each cloud account. Look at scenario 2 and how the physical location of the zone is represented with the logical zone ID.
The networking between AZs is implemented with low-latency redundant high-fiber connectivity. All traffic between AZs is encrypted. Hence, the network performance is sufficient to accomplish synchronous replication between AZs.

Azure-Availability Zones:

Azure hosts Availability zones in almost 28+ regions today (not all regions) and are expanding its footprints across other regions.
In the regions where Availability zones are not available, Azure uses a concept called Availability sets. Also, Azure provides other region-level resilient high-availability options called Region pairs. To understand more, click here.
Each region has 3 availability zones and it is represented as a zone followed by ID. Ex: Zone 1

Google Cloud – Zones:

In Google cloud, it is simply called Zones. Each region has 3 zones except Central US which has 4 Zones.
All Google Cloud hardware is organized into clusters. A cluster represents a set of compute, network, and storage resources supported by a building, power, and cooling infrastructure.
Google uses to map public zones to clusters of internal physical hardware within data centers. This is called ‘Zone virtualization’.
Google cloud follows standard practice within a zone using zone virtualization. Let us look at a simple example of how zone mapping happens across zones for multiple cloud projects (aka Cloud account) in scenario-2.

*Fig. Availability Zones count over AWS, Azure, and Google Cloud*

AWS has 87 Availability Zones across 27 Regions; Azure has 81 Availability Zones with 27 Regions and Google Cloud has 106 Zones across 35 Regions. The number keeps changing since the cloud provider keeps extending its footprint and refer to the cloud provider site for the latest numbers.

If you host your workloads across multiple zones, then it is better isolated in a region. In general, for hosting a single instance or VM, the SLA offered is 99.5%; For VMs or instances hosted across multiple Zones, the SLAs offered is 99.99% ‘four nines’ (which is 4.38 minutes per month). Let’s see the SLA (Service Level Agreement) offered by cloud providers.

SLA	AWS	Azure	Google Cloud
For single VM	>= 99.5%	>= 99.5% to 99.9% (varies on disk type)	>= 99.5%
For VMs across multiple Zones	>= 99.99%	>= 99.99%	>= 99.99%

SLAs for instances/VMs across Zones

Let us look into the various scenarios on the Availability Zones to understand better.

Scenario: 1 – Knowing the boundary of resources

You have created a Virtual Machine VM1 in Zone-1; You want to attach SSD based disk from Zone-2 to the VM1. Is that possible?

Solution: It is always important to understand the boundary of cloud resources be it Global, regional, or zonal. Global resources are available across any region like DNS services. Regional resources can be used by any resource in that region, regardless of zone, while zonal resources can only be used by other resources in the same zone.

You have a virtual machine(VM) that is zonal bound and the virtual disk for the VM has to be in the same zone. After you create the disk or volume in Zone-1, you can only attach it to VM instances that are in the same Availability Zone-1.

Scenario: 2 – Zones allocation across Cloud accounts

Let’s consider a scenario. There is a cloud provider “XYZ” and customers “B” & “C” are consumers.
East US region has 3 availability zones Zone-1a, Zone-1b, and Zone-1c;
Customer B has chosen East US region and hosted their Virtual Machines in Zone-1a and Zone-1b;
Customer C has chosen East US region and hosted their Virtual Machines in Zone-1b and Zone-1c.
There is a service failure of one physical zone from the Cloud provider “XYZ” because of a natural disaster. In this case, both customer B “Zone-1a” and Customer C “Zone-1c” got impacted because of the non-availability of a single zone.

Guess, Why?

Ideally, any one customer should get impacted in case of single zone failure. There is a catch in this. It is important to understand how a cloud provider maps the Availability zone to your cloud account or subscription. To know more about Cloud accounts, see my post.

Solution:
Cloud providers independently map Availability Zones to codes for each cloud account/subscription. Look at the below-left part of the figure, Customer-B is using two zones 1a and 1b. Customer C is using two zones 1c and 1b. But logically, 1a of Customer B and 1c of Customer C is mapped to the same physical zone-use1-az1. Hence from the above scenario, both customers would have got impacted if the use1-az1 has gone for failure.

*Fig. Availability Zones – Physical to logical mapping*

The other logical representation also is shown for AWS cloud- Mumbai region (ap-south-1) for your reference in the next half of the figure. The mapping looks the same for both customers B and C, but this is not similar in all scenarios.

Let’s see how this mapping works in Azure cloud and Google Cloud.

Azure Cloud zone Mapping across subscriptions:

In Azure cloud, logical mapping of AZs varies across subscription. You can also verify the mapping between two subscriptions with the help of Check-ZonePeers API. Azure subscriptions are automatically assigned with AZ mapping at the time a subscription is created. In the below snippet, Availability Zone-1 of Subscription-1 is logically mapped to Availability Zone-3 for subscription-2.

*Fig. Azure Availability ZonePeers across subscriptions*

Google Cloud Zone mapping across projects:

In the figure shown below, each zone is supported by multiple clusters. Google Cloud aims to group clusters with shared infrastructure, such as a building or cooling infrastructure, into logical zones so that shared infrastructure failures affect only one zone within a region.

*Fig. Virtualized zones in Google Cloud* Source: Google

Customer workloads are maintained in the fewest number of clusters possible. Usually, the zonal workload is contained in a single cluster. However, zone-to-cluster mappings might include additional clusters in cases where additional capacity or specialized hardware is not available in the primary cluster for the map. There are three zones A, B, and C in asia-east1 region. In the above figure,

Project Fizz has two clusters mapped asia-east1-a because only Cluster z supports GPU workloads and only Cluster y supports TPU workloads.
Project Fizz and Project Buzz have different clusters mapped to asia-east1-b.
Project Fizz and Project Buzz have the same cluster mapped to asia-east1-c.

Scenario: 3 – Placement of Servers across Availability Zones

There are 11 servers to be hosted in Cloud for a business application and need high availability as well as resiliency. All the web servers do the same job. There are 8 web servers and 2 database servers. How do you plan for the placement of the servers in a region that has 4 availability zones?

Solution: Here the answer is going to be very simple. You need to distribute the servers doing the same job evenly across zones and try to use all the zones possible to improve the high availability. In this case, only one zone will have 3 web servers and the rest three zones will have two servers. Try to place the Database servers in the zone where more web servers are available and follow the distribution for another database server too.

Zone-1	Zone-2	Zone-3	Zone-4	Total
3 Servers	2 Servers	2 Servers	2 Servers	9 Web Servers
1 Database Server	1 Database Server			2 DB Servers

Scenario: Servers distributed across zones

Scenario-4: Network latency among resources across Zones

There are two web servers hosted in the East US region. The ‘Bairav-web-az1a’ server is hosted in 1a zone. The ‘Bairav-web-az1b’ server is hosted in zone 1b. The servers are of the same configuration (m5.large – 2 cores; 8 GB RAM) You can see the IP address highlighted below. What is the network latency among these two servers across zones 1a and 1b?

*Fig. Two web servers hosted across Zones*

Solution:

In general, latency varies based on the instance type, network routing methods across zones, workloads/apps running on the instance, and network speed offered by each instance. In this scenario, if I try to ping between the servers, to check the latency, you can see the single-digit milliseconds latency. It is less than 1ms. This test was done in the AWS cloud for the instance configuration specified.

*Fig. Latency test between servers across zones*

Note: Similarly in Azure, availability zones are connected by a high-performance network with a round-trip latency of less than 2ms as per official documentation.

Scenario-5: What impacts if an Availability zone fails & a few real-time incidents

There have been incidents reported across cloud providers where zone-level failure impacted cloud services due to unavailability of Power, failure of coolant systems, networking traffic congestion, system-level updates failure, etc… Let’s see some of the real-time incidents and you can refer to more details with the links provided. Below table is just for high level understanding and doesn’t cover all incidents.

Cloud	Incident Description	Details & Timestamp	Impact
Google Cloud	Failure of multiple redundant cooling systems in one of the data centers in Europe-west2-a ;	Start: 19-Jul-22; 06:33 End: 20-Jul-22; 21:20 Duration: 1 day, 14 hours, 47 minutes	Multiple Cloud products experiencing elevated error rates, latencies, or service unavailability in Europe-west2. Click here to know more.
AWS	Power failure that disrupted services located within Availability Zone 1 (AZ1) in the US-EAST-2 Region	Start: 28-Jul-22; 17:00 UTC Duration: 20 Mins	Outage affected connectivity to and from the region and brought down Amazon’s EC2 instances, which impacted applications such as Webex, Okta, Splunk, BambooHR, and others. To read more, click here.
AWS	Cooling failure caused a small percentage of EC2 servers in a single AZ in Tokyo to shut down due to overheating.	Start: 23-Aug-2018; 12:36 PM JST	This resulted in impaired EC2 instances and degraded EBS volume performance for some resources in the affected area of the Availability Zone.. Click here for more.
AWS	Data center power loss brings down AWS services, in another East Coast cloud outage; 3rd outage in the same month	Start: 22-Dec-21; 4:35 AM PST	Increased EC2 launched failures and networking connectivity issues for some instances in a single Availability Zone (USE1-AZ4) in the US-EAST-1 Region. Click here for more details. For news, read this. https://www.theregister.com/2021/12/22/aws_outage/

Tab. Real-time incidents in the Availability Zones

Summary: Regions and Availability Zones across AWS, Azure, and Google Cloud

Features	AWS	Azure	Google
Region Name	Region	Location	Region
# of Regions	27	60+	35
Region Representation	Ex: us-east-1; ap-south-1	Ex: East US 2; South India; UK South	Ex: us-west2; Asia-south1
Availability zone Name	Availability Zones AZ	Availability Zones	Zones
# of Zones	87	81 (in 27 Regions)	106
Availability Zone representation	Physical: use1-az1 Logical: useast-1a (a,b,c for zone #)	Zone 1, Zone 2, Zone 3	us-west2-a us-west2-b
Upcoming or Planned regions	7 Regions + 21 AZs	5 Regions + Expanding AZs in other regions	8 Regions + 24 Zones

Tab. Regions and Availability Zones comparison