Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Best practices for spanning AWS availability zones (or equivalent at other cloud providers)

avatar

Are there HDP applications where latency between availability zones (AZ) (approx. 1 ms) is significant? It seems like rack awareness could be used, treating each AZ as a different rack.

  • Is this the common way to handle this in practice?
  • Does anyone have examples of SLAs for clusters with and without multiple AZs?
  • Anything else to be aware of regarding EC2 AZs (or the equivalents at other cloud providers)?
1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Master Mentor

@Alex Miller I doubt that you will find the exact answer of this. This is good starting point and based on your use case , you can gather more data.

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Ok, across AWS Regions I understand, but it seems like AZs should have minimal performance impacts (latency isn't much higher) and would provide redundancy for HA.

Either way, I'm glad to hear feedback from what is seen in the field and from other providers.

avatar
New Contributor

Greetings @Paul Codding, it has been a few years since activity on this thread and our team is wondering if it is still the case that Hortonworks does not recommend spanning multiple availability zones to implement Hadoop high availability in AWS?

In a recent post on the subject @fschneider replied "that in case of HA clusters the HA nodes should be launched in different availabilty zones". https://community.hortonworks.com/questions/176198/will-single-availability-zone-provide-high-availa...

Other vendors are recommending a deployment methodology that spans AWS availability zones while also noting data transfer costs, network latency and throughput considerations.

Many thanks in advance!

avatar
New Contributor

Amazon EC2 recently introduced Partition Placement Groups for rack-aware applications -

https://aws.amazon.com/blogs/compute/using-partition-placement-groups-for-large-distributed-and-repl...