Support Questions
Find answers, ask questions, and share your expertise

Best Practices for Deploying Production Hadoop Cluster

Solved Go to solution

Best Practices for Deploying Production Hadoop Cluster

Explorer

I am planning to setup a 4 node Production Cluster on Azure VM's. I am planning to have 1 edge Node, 1 Master Node and 2 Slave Nodes. I wanted to setup below mentioned services on that cluster.

1) Namenode

2) Oozie

3) DataNode

4) Yarn

5) Spark

6) Ranger

7) Atlas

8) Knox

9) Hbase

10) SAP Hana Vora

11) Zookeeper

I am actually looking out for any guidelines on Memory, Cores and Storage to be required for different services of hadoop as mentioned above. I need to buy 4 VM's on Azure but i want to understand from Infrastructure perspective that how much memory, cores, Storage would be optimal for above mentioned hadoop services(service wise) ,keeping in mind more services can also be added in future.

Is there any reference documentation/link?

Any help would be appreciated.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Best Practices for Deploying Production Hadoop Cluster

@rahul gulati

See this article on best practices for deploying HDP on Azure: https://community.hortonworks.com/articles/22376/recommendations-for-microsoft-azure-hdp-deployment-...

For most production clusters, we typically recommend enabling HA for services. That requires that you have at a minimum 2 master servers, although 3 would be better. You need 3 Zookeeper instances. While you can put Zookeeper on the data nodes, it would be better to put Zookeeper on the master nodes.

View solution in original post

3 REPLIES 3

Re: Best Practices for Deploying Production Hadoop Cluster

Explorer

@Attila Kanto

Thoughts?

Re: Best Practices for Deploying Production Hadoop Cluster

@rahul gulati

See this article on best practices for deploying HDP on Azure: https://community.hortonworks.com/articles/22376/recommendations-for-microsoft-azure-hdp-deployment-...

For most production clusters, we typically recommend enabling HA for services. That requires that you have at a minimum 2 master servers, although 3 would be better. You need 3 Zookeeper instances. While you can put Zookeeper on the data nodes, it would be better to put Zookeeper on the master nodes.

View solution in original post

Re: Best Practices for Deploying Production Hadoop Cluster

Explorer

@Michael Young

Thanks!! article is very useful.