Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hadoop best practices for production cluster set up

Solved Go to solution
Highlighted

Hadoop best practices for production cluster set up

Contributor

Hi,

We are planning to set up HDP 2.4 on 6 node production cluster with services like HDFS, MapReduce, Yarn, Hive, HBase, Oozie, Zookeper, Knox, Ranger.

We also want to integrate all the nodes with AD using SSSD and add kerberos security to it.

Please find below configuration details of these nodes.

20 cores per machine

128 to 440GB

36 TB storage

we are planning to have HA enabled. Any suggestion on how the services should be spread across nodes?

Is it advisable to have Ambari server and name-node on the same node?

Even if we handle the name-node disaster using HA enabled for name-node. As Ambari don't have HA. How do we handle the situation, if a node with Ambari server install go down ?

Is it advisable to have a edge node ? What are the advantages of having it ?

Can edge node have different OS than the cluster OS ? I think that its not possible to have edge node with different version of OS as the clients need to installed in it. But the situation is that, we are using Red hat 7 for cluster and we want to use HUE. Hue being incompatible with Red hat 7, only option we can think of is, having an edge node with Cent-OS 6 and installing hue in it. Is there any other way to achieve this ?

If we have a edge node, as this is the single point of access to cluster resources. Do we still need to have AD integrated with all the cluster nodes or edge node alone will do the thing?(assuming the kerberos is enabled in cluster.)

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hadoop best practices for production cluster set up

Guru

In case, you have to have edge node that a different OS, I suggest you use a manual approach by installing client packages (follow manual install document). Once installed, to get configs, you can clients on another node and download client configs. This will have some obvious overhead of keeping these configs in sync with ambari changes.

Ambari Server and NN can be on the same node. Just make sure NN metadata directory, Journal Node data and zookeeper data goes into disk mounts that are not used by other services.

Ambari Server, being a management console on top of hadoop can be considered non-critical. If ambari is down, and you need to restart a service, you can use the manual approach while you are working on bringing ambari server up. Unless you are using views, it will not be end users (considering cluster admins as not end users) that will be affected with ambari server going down. If you are using views, you can have multiple ambari servers running (with only views)

2 REPLIES 2

Re: Hadoop best practices for production cluster set up

Guru

In case, you have to have edge node that a different OS, I suggest you use a manual approach by installing client packages (follow manual install document). Once installed, to get configs, you can clients on another node and download client configs. This will have some obvious overhead of keeping these configs in sync with ambari changes.

Ambari Server and NN can be on the same node. Just make sure NN metadata directory, Journal Node data and zookeeper data goes into disk mounts that are not used by other services.

Ambari Server, being a management console on top of hadoop can be considered non-critical. If ambari is down, and you need to restart a service, you can use the manual approach while you are working on bringing ambari server up. Unless you are using views, it will not be end users (considering cluster admins as not end users) that will be affected with ambari server going down. If you are using views, you can have multiple ambari servers running (with only views)

Re: Hadoop best practices for production cluster set up

New Contributor


@vinay kumar Very late to answer but this may help others.Can you please post where you stand with your cluster now?

Is it advisable to have Ambari server and name-node on the same node?

Ambari being a administration tool is recommended to be on Edge node(if you configure one).

Even if we handle the name-node disaster using HA enabled for name-node. As Ambari don't have HA. How do we handle the situation, if a node with Ambari server install go down ? While you work on restoring Ambari service,the individual services grouped into Ambari are accessible with their corresponding port numbers.

Is it advisable to have a edge node ? What are the advantages of having it ?

Edge node is usually neglected in design discussion but it is recommended to have as

-it ensures less load/resource competition to name node as client applications & admin tools are usually configured to run here.

-They act as a staging area during data transfers in & out of cluster.

Can edge node have different OS than the cluster OS ? I think that its not possible to have edge node with different version of OS as the clients need to installed in it. But the situation is that, we are using Red hat 7 for cluster and we want to use HUE. Hue being incompatible with Red hat 7, only option we can think of is, having an edge node with Cent-OS 6 and installing hue in it. Is there any other way to achieve this ?

Edge nodes usually have different H/W requirements than master/slave nodes so I think it can be done(I'm not sure)