Support Questions

Find answers, ask questions, and share your expertise

Best practices for spanning AWS availability zones (or equivalent at other cloud providers)

avatar

Are there HDP applications where latency between availability zones (AZ) (approx. 1 ms) is significant? It seems like rack awareness could be used, treating each AZ as a different rack.

  • Is this the common way to handle this in practice?
  • Does anyone have examples of SLAs for clusters with and without multiple AZs?
  • Anything else to be aware of regarding EC2 AZs (or the equivalents at other cloud providers)?
1 ACCEPTED SOLUTION

avatar

Alex, I would not recommend customers deploy clusters across availability zones, while it is technically feasible to use rack awareness to segregate racks per AZ, I haven't seen us recommend this in the past, and other distribution providers even go as far to say it is not supported (multi-AZ deployment).

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Alex Miller I doubt that you will find the exact answer of this. This is good starting point and based on your use case , you can gather more data.

avatar

Alex, I would not recommend customers deploy clusters across availability zones, while it is technically feasible to use rack awareness to segregate racks per AZ, I haven't seen us recommend this in the past, and other distribution providers even go as far to say it is not supported (multi-AZ deployment).

avatar

Ok, across AWS Regions I understand, but it seems like AZs should have minimal performance impacts (latency isn't much higher) and would provide redundancy for HA.

Either way, I'm glad to hear feedback from what is seen in the field and from other providers.

avatar
New Contributor

Greetings @Paul Codding, it has been a few years since activity on this thread and our team is wondering if it is still the case that Hortonworks does not recommend spanning multiple availability zones to implement Hadoop high availability in AWS?

In a recent post on the subject @fschneider replied "that in case of HA clusters the HA nodes should be launched in different availabilty zones". https://community.hortonworks.com/questions/176198/will-single-availability-zone-provide-high-availa...

Other vendors are recommending a deployment methodology that spans AWS availability zones while also noting data transfer costs, network latency and throughput considerations.

Many thanks in advance!

avatar
New Contributor

Amazon EC2 recently introduced Partition Placement Groups for rack-aware applications -

https://aws.amazon.com/blogs/compute/using-partition-placement-groups-for-large-distributed-and-repl...