I would recommend
- On DataNodes hosts place only DataNode service & NodeManager Service. (These are the workernodes and thus should be focused only on data placement and data processing).
- Place the ATS service (App Timeline Server) on a node OTHER than NameNode or Datanode.
- Dedicated server for the metdata databases of Ambari, Hive & Ranger.
- HiveServer2 should be placed on the gateway (edge) nodes (the nodes which your end clients will use to interact with hive).
- Focus on high availability for HiveServer2, NameNode (Active/StandBy) and ResourceManager.
- Focus on reading and implementing Hive performance tuning & LLAP based on current documentation from Hortonworks.
These are general guidelines from my own personal experience and should be taken as such 🙂
Thanks for the reply
I do have one more query that is it necessary to configure RAID 10 to Master nodes as we have dell servers with only 2 disks so raid 1+0 needs like 4 disks.
Like we have total of 11 Node Cluster and 1 edge Node. We are planning to configure 3 master out of which one will be standby and 8 data/worker nodes. Please guide how to proceed.
Thanks a Ton