We want set up to a 6 node cluster(HDP 2.4 )with services like HDFS, MapReduce, Yarn, Hive, Oozie, Zookeper, Knox and Ranger with kerberos.
You can find configuration of these nodes below
2 Machines with
2x E5-2670V2 2.5GHz-25MB 10C CPU
448GB PC3-14900L RAM
2x 80GB 6G SATA SSD RAID 1
and 4 Machines with
2x E5-2660V3 2.5GHz-25MB 10C CPU
128GB PC4-2133P RAM
2x 500GB 6G SATA 7.2k HDD RAID 1
10x 2TB 6G SATA 7.2k HDD JBOD
The Idea is to use the two machines with 448gb Ram for master services like Name node, Resource Manager, job tracker, hiveserver2 etc.
One Master will have Namenode alone and other one with Name node high availability and other services like resource manger, hiveserver2 and other master services. Is it advisable to install amabri and hue in Master or use a VM for that ? as in edge node ?
where should ranger and knox go ? is it one of the masters ?
I have been following some blogs and articles in hortonworks community, But I am not completely sure on how much the placement of services effect the cluster performance.
Please share any good articles on how to configure these master services.
Thanks in advance :)