I noticed Cloudera recommend to have nodes (like master) in different AZ of a region. I am wondering how we can do this in cluster.confg file?
The recommendation to deploy across availability zones for HA is very recent and isn't yet supported directly in Director. Each environment definition for AWS only defines a single subnet, which corresponds to a single AZ.
Thanks for this info Bill. I was wondering about it and was about to post a similar question.
Does Cloudera have any guidelines as to using multiple AZ's (in one region)?
My current thinking is
1) Two (or more) separate clusters - one in each AZ
2) A single Cloudera Manager in one AZ which controls all.
3) A single Cloudera Director in one AZ which creates all instances
4) A single Cloudera Navigator in one AZ which monitors/audits all.
If we can turn some or all of these (2-4) into multiple AZ HA setup that would be really cool.
I am a bit concerned about the cross region network traffic - but perhaps I should save that for my own thread.
Based on the terminology used, I'm assuming that you are referring to AWS.
Please refer to the Cloudera Enterprise Reference Architecture for AWS Deployments. "Appendix A: Spanning AWS Availability Zones" will provide the guidance you are looking for.
In Director, each Environment corresponds to a single region, but you can (indirectly) select the AZ used for each node by setting the subnet in the InstanceTemplate. Refer to the aws.reference.conf for an example of where you can override the subnetId and set the rackId. Note that, since a single InstanceTemplate is used for each instance group, you will need to make separate instance groups for each AZ.
It is possible but not recommended. In AWS you can use Peering VPC. Keep in mind to add appropriate CIDR in your routing tables.