Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Best Practices for Deploying Production Hadoop Cluster

avatar
Rising Star

I am planning to setup a 4 node Production Cluster on Azure VM's. I am planning to have 1 edge Node, 1 Master Node and 2 Slave Nodes. I wanted to setup below mentioned services on that cluster.

1) Namenode

2) Oozie

3) DataNode

4) Yarn

5) Spark

6) Ranger

7) Atlas

😎 Knox

9) Hbase

10) SAP Hana Vora

11) Zookeeper

I am actually looking out for any guidelines on Memory, Cores and Storage to be required for different services of hadoop as mentioned above. I need to buy 4 VM's on Azure but i want to understand from Infrastructure perspective that how much memory, cores, Storage would be optimal for above mentioned hadoop services(service wise) ,keeping in mind more services can also be added in future.

Is there any reference documentation/link?

Any help would be appreciated.

Thanks

1 ACCEPTED SOLUTION

avatar
Super Guru

@rahul gulati

See this article on best practices for deploying HDP on Azure: https://community.hortonworks.com/articles/22376/recommendations-for-microsoft-azure-hdp-deployment-...

For most production clusters, we typically recommend enabling HA for services. That requires that you have at a minimum 2 master servers, although 3 would be better. You need 3 Zookeeper instances. While you can put Zookeeper on the data nodes, it would be better to put Zookeeper on the master nodes.

View solution in original post

3 REPLIES 3

avatar
Rising Star

@Attila Kanto

Thoughts?

avatar
Super Guru

@rahul gulati

See this article on best practices for deploying HDP on Azure: https://community.hortonworks.com/articles/22376/recommendations-for-microsoft-azure-hdp-deployment-...

For most production clusters, we typically recommend enabling HA for services. That requires that you have at a minimum 2 master servers, although 3 would be better. You need 3 Zookeeper instances. While you can put Zookeeper on the data nodes, it would be better to put Zookeeper on the master nodes.

avatar
Rising Star

@Michael Young

Thanks!! article is very useful.